开发者

ruby mechanize: how read downloaded binary csv file

I'm not very familiar using ruby with binary data. I'm using mechanize to download a large number of csv files to my local disk. I then need to search these files for specific strings.

I use the save_as method in mechanize to save the file (which saves the file as binary). The content type of the file (according to mechanize) is:

application/vnd.ms-excel;charset=x-UTF-16LE-BOM

From here, I'm not sure how to read the file. I've tried reading it in as a normal file in ruby, but I just get the binary data. I've also tried just using standard unix tools (strings/grep) to try and search without any luck.

When I run the 'file' command on one of the files, I get:

foo.csv: Little-endian UTF-16 Unicode Pascal program text, with very long lines, with CRLF, CR, LF line terminators

I can see the data just fine with cat or vi. With vi I also see some control characters.

I've also tried both the csv and fastercsv ru开发者_Python百科by libraries, but I get 'IllegalFormatError' exception for these. I've also tried this solution without any luck.

Any help would be greatly appreciated. Thanks.


You can use the command 'iconv' to conver to UTF-8,

# iconv -f 'UTF-16LE' -t 'UTF-8' bad_file.csv > good_file.csv

There is also a wrapper for iconv in the standard library, you could use that to convert the file after reading it into your program.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜