开发者

Handling foreign characters in HTTP POST with gae python

It's puzzling me that I've got 2 functions with HTTP POST where one breaks foreign characters and I just do self.request.POST.get('text') to get the value in both functions. The difference I see is that where it breaks i开发者_运维知识库t inherits blobstoreuploadhandler so therefore I suspect that it might have to do with that change. I don't understand for example why ÅÄÖ first works and then I make a seemingly unrelated change and suddently any non-ASCII character get mangled.

Please help me understand how python should work with unicode and utf-8.

I have the complete 2 code examples where one works and the other distorts foreign characters like ÅÄÖ and I just need to know what to change and I think it should be possible to adjust so that it behaves as expected.

To understand exactly what the problem is maybe it helps to know that if I input ÅÄÖ the output becomes xcTW when it should be ÅÄÖ.

The 2 pieces of code mentioned are

class AList(RequestHandler, I18NHandler):
  ...
  a.text = self.request.POST.get('text')

The above works. Then I changed to

class AList(RequestHandler, I18NHandler, blobstore_handlers.BlobstoreUploadHandler):  
   ...
  a.text = self.request.POST.get('text')

And this seems to be the only difference. The 2 ideas I have is deploying 2 examples with the same app and see what is really causing this issue since it may or may not be in the code I paste here.

And this is also just a production issue when locally foreign characters work as expected.

It seems it is related to the usage of blobstoreuploadhandler since the following reproduces the garbled characters by email:

class ContactUploadHandler(blobstore_handlers.BlobstoreUploadHandler):
    def post(self):
        message = mail.EmailMessage(sender='admin@myapplicationatappspot.com', subject=self.request.POST.get('subject'))
        message.body = ('%s \nhttp://www.myapplicationatappspot.com/') % ( self.request.POST.get('text') )
        message.to='info@myapplicationatappspot.com'
        message.send()
        self.redirect('/service.html')


It looks like you've hit this bug: http://code.google.com/p/googleappengine/issues/detail?id=2749

As a workaround until it gets fixed, you can encode all your input in base64 using JavaScript. It's not ideal but it did the trick for me.


xcTW is the result of base-64 encoding the cp1252 or latin1 encoding of those 3 characters; see the following IDLE session:

>>> import base64; print repr(base64.b64decode('xcTW'))
'\xc5\xc4\xd6'
>>> print repr('ÅÄÖ')
'\xc5\xc4\xd6'
>>>

BUT base-64 encoding mangles ASCII characters as well:

>>> base64.b64encode('abcdef')
'YWJjZGVm'
>>> 

Looks like you need to look into the transfer encoding.

If you can't work out from this what is happening, try publishing your two pieces of code.

Update More of the train of thought: a "blob" is a Binary Large OBject, hence the base64 encoding to ensure that it can be transported across a network that might not be 8-bit clean. I'm not sure why you are using blobs if you are expecting text. If you really must stick that 3rd arg in there, then just use base64.b64decode() on the bytes that are returned. If all else fails, read the gae docs to see if there's a way of turning off the base 64 encoding.

Even more ToT: perhaps the blobhandler transmits in ASCII if it fits otherwise base64-encodes it -- this would fit with the reported behaviour. In that case you have to detect what the encoding is. I say again: read the gae docs.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜