开发者

Upload and parse csv file with google app engine

I'm wondering if anyone with a better understanding of python and gae can help me with this. I am uploading a csv file from a form to the gae datastore.

class CSVImport(webapp.RequestHandler):
  def post(self):
     csv_file = self.request.get('csv_import')
     fileReader = csv.reader(csv_file)
     for row in fileReader:       
       self.response.out.write(row) 

I'm running into the same problem that someone else mentions here - http://groups.google.com/group/google-appengine/browse_thread/thread/bb2d0b1a80ca7ac2/861c8241308b9717

That is, the csv.reader is iterating over each character and not the line. A google engineer left this explanation:

The call self.request.get('csv') returns a String. When you iterate over a string, you iterate over the characters, not the lines. You can see the difference here:

 class ProcessUpload(webapp.RequestHandler): 
   def post(self): 
     self.response.out.write(self.request.get('csv')) 
     file = open(os.path.join(os.path.dirname(__file__), 'sample.csv')) 
     self.response.out.write(file) 

     # Iterating over a file 
     fileReader = csv.reader(file) 
     for row in fileReader: 
       self.response.out.write(row) 

     # Iterating over a string 
     fileReader = csv.reader(self.request.get('csv')) 
     for row in fileReader: 
       self.r开发者_StackOverflow社区esponse.out.write(row) 

I really don't follow the explanation, and was unsuccessful implementing it. Can anyone provide a clearer explanation of this and a proposed fix?

Thanks, August


Short answer, try this:

fileReader = csv.reader(csv_file.split("\n"))

Long answer, consider the following:

for thing in stuff:
  print thing.strip().split(",")

If stuff is a file pointer, each thing is a line. If stuff is a list, each thing is an item. If stuff is a string, each thing is a character.

Iterating over the object returned by csv.reader is going to give you behavior similar to iterating over the object passed in, only with each item CSV-parsed. If you iterate over a string, you'll get a CSV-parsed version of each character.


I can't think of a clearer explanation than what the Google engineer you mentioned said. So let's break it down a bit.

The Python csv module operates on file-like objects, that is a file or something that behaves like a Python file. Hence, csv.reader() expects to get a file object as it's only required parameter.

The webapp.RequestHandler request object provides access to the HTTP parameters that are posted in the form. In HTTP, parameters are posted as key-value pairs, e.g., csv=record_one,record_two. When you invoke self.request.get('csv') this returns the value associated with the key csv as a Python string. A Python string is not a file-like object. Apparently, the csv module is falling-back when it does not understand the object and simply iterating it (in Python, strings can be iterated over by character, e.g., for c in 'Test String': print c will print each character in the string on a separate line).

Fortunately, Python provides a StringIO class that allows a string to be treated as a file-like object. So (assuming GAE supports StringIO, and there's no reason it shouldn't) you should be able to do this:

class ProcessUpload(webapp.RequestHandler): 
   def post(self): 
     self.response.out.write(self.request.get('csv')) 

     # Iterating over a string as a file 
     stringReader = csv.reader(StringIO.StringIO(self.request.get('csv')))
     for row in stringReader: 
        self.response.out.write(row) 

Which will work as you expect it to.

Edit I'm assuming that you are using something like a <textarea/> to collect the csv file. If you're uploading an attachment, different handling may be necessary (I'm not all that familiar with Python GAE or how it handles attachments).


You need to call csv_file = self.request.POST.get("csv_import") and not csv_file = self.request.get("csv_import").

The second one just gives you a string as you mentioned in your original post. But accessing via self.request.POST.get gives you a cgi.FieldStorage object.

This means that you can call csv_file.filename to get the object’s filename and csv_file.type to get the mimetype. Furthermore, if you access csv_file.file, it’s a StringO object (a read-only object from the StringIO module), not just a string. As ig0774 mentioned in his answer, the StringIO module allows you to treat a string as a file.

Therefore, your code can simply be:

class CSVImport(webapp.RequestHandler):
  def post(self):
     csv_file = self.request.POST.get('csv_import')
     fileReader = csv.reader(csv_file.file)
     for row in fileReader:
       # row is now a list containing all the column data in that row
       self.response.out.write(row)
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜