开发者

how do i import from a unicode (utf-8) csv file into a numpy array

im not trying to do this smart or fast, just trying to do it at all.

i have a file looks like this :

$ cat all_user_token_counts.csv  
@5raphaels,in,15
@5raphaels,for,15
@5raphaels,unless,11
@5raphaels,you,11

i know its uncode utf-8 encoded because i created it, like this

    debug('opening ' + ALL_USER_TOKEN_COUNTS_FILE)
    file = codecs.open(ALL_USER_TOKEN_COUNTS_FILE, encoding="utf-8",mode= "w")
    for (user, token) in tokenizer.get_tokens_from_all_files():
        #... count tokens ..
   开发者_运维问答     file.write(unicode(username +","+ token +","+ str(count) +"\r\n"))

i want to read it in to a numpy array so it looks like this, or something..

   array([[u'@5raphaels', u'in', 15],
          [u'@5raphaels', u'for', 11],
          [u'@5raphaels', u'unless', 11]], 
          dtype=('<U10', '<U10', int))

As i experiment in process of writing this question it comes to me that it may not even be possible? If so, I'd love to know!

Thanks in advance!


This can be done easily with np.loadtxt:

import numpy as np
arr=np.loadtxt('all_user_token_counts.csv',delimiter=',',
                  dtype = '|U10,<U10,int')
print(arr)

# [(u'@5raphaels', u'in', 15) (u'@5raphaels', u'for', 15)
#  (u'@5raphaels', u'unless', 11) (u'@5raphaels', u'you', 11)]
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜