开发者

Access tab separated value content stored in a GAE blob, but using universal-newline mode or equivalent

I'm trying to read开发者_StackOverflow中文版 the contents of a TSV file as part of a Google App Engine application.

I can read from a file fine by using:

f=csv.reader(open(matrixpath, "rU"),dialect='excel-tab')

However I now need to read the data from the blobstore using blobreader:

blob_key = ...
blobdata = blobstore.BlobReader(blob_key)
f=csv.reader(blobdata,dialect='excel-tab')

(I've uploaded a copy of the entire code that I'm having this issue with here)

Without the rU argument I get a new-line in unquoted field error:

Error: new-line character seen in unquoted field - do you need to open the file in universal-newline mode?

I would like to either fix my file so that I do not get this error, or emulate opening from the blobstore in a universal-newline mode?

My file is around 20MB, and a cut down sample of it (that the script still fails on) can be found here.


I cannot reproduce the error directly from the sample file. Can you?

Given blob = open('sample-file.tsv', 'rb').read():

  1. reader = csv.reader(blob, dialect='excel-tab') produces a zillion or so one-byte fields, as expected.

  2. Substituting StringIO.StringIO(blob) or blob.splitlines() produces 50 rows each with about 10000 columns ... appears to be working correctly.

Unless you show (1) your blob uploading code (and URL of relevant docs) (2) your code that is getting the error on GAE, further assistance doesn't appear to be possible.


From Upload and parse csv file with "universal newline" in python on Google App Engine , the following answer worked for me:

csv.reader(blob.open.read().splitlines())

to read a mac formatted csv file on GNU/Linux.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜