how do i import from a unicode (utf-8) csv file into a numpy array
im not trying to do this smart or fast, just trying to do it at all.
i have a file looks like this :
$ cat all_user_token_counts.csv
@5raphaels,in,15
@5raphaels,for,15
@5raphaels,unless,11
@5raphaels,you,11
i know its uncode utf-8 encoded because i created it, like this
debug('opening ' + ALL_USER_TOKEN_COUNTS_FILE)
file = codecs.open(ALL_USER_TOKEN_COUNTS_FILE, encoding="utf-8",mode= "w")
for (user, token) in tokenizer.get_tokens_from_all_files():
#... count tokens ..
开发者_运维问答 file.write(unicode(username +","+ token +","+ str(count) +"\r\n"))
i want to read it in to a numpy array so it looks like this, or something..
array([[u'@5raphaels', u'in', 15],
[u'@5raphaels', u'for', 11],
[u'@5raphaels', u'unless', 11]],
dtype=('<U10', '<U10', int))
As i experiment in process of writing this question it comes to me that it may not even be possible? If so, I'd love to know!
Thanks in advance!
This can be done easily with np.loadtxt:
import numpy as np
arr=np.loadtxt('all_user_token_counts.csv',delimiter=',',
dtype = '|U10,<U10,int')
print(arr)
# [(u'@5raphaels', u'in', 15) (u'@5raphaels', u'for', 15)
# (u'@5raphaels', u'unless', 11) (u'@5raphaels', u'you', 11)]
精彩评论