开发者

DjangoUnicodeDecodeError while storing pickle'd data

I've got a simple dict object I'm trying to store in the database after it has been run through pickle. It seems that Django doesn't like trying to encode this error. I've checked with MySQL, and the query isn't even getting there before it is throwing the error, so I don't believe that is the problem. The dict I'm storing looks like this:

{
    'ordered': [
        {   'value': u'First\xd1ame Last\xd1ame',
            'label': u'Full Name' },
        {   'value': u'123-456-7890',
            'label': u'Phone Number' },
        {   'value': u'user@nowhere.org',
            'label': u'Email Address' } ],
    'cleaned_data': {
        u'Phone Number': u'123-456-7890',
        u'Full Name': u'First\xd1ame Last\xd1ame',
        u'Email Address': u'user@nowhere.org' },
    'post_data': <QueryDict: {
        u'Phone Number': [u'1234567890'],
        u'Full Name_1': [u'Last\xd1ame'],
        u'Full Name_0': [u'First\xd1ame'],
        u'Email Address': [u'user@nowhere.org'] }>,
    'user': <User: itis>
}

The error that gets thrown is:

'utf8' codec can't decode bytes in position 52-53: invalid data.

Position 52-53 is the first instance of \xd1 (Ñ) in the pickled data.

So far, I've dug around StackOverflow and found a few questions where the database encoding for the objects was wrong. This doesn't help me because there is no MySQL query yet. Thi开发者_C百科s is happening before the database. Google also didn't help much when searching for unicode errors on pickled data.

It is probably worth mentioning that if I don't use the Ñ, this code works fine.


With much thanks to @prometheus, I found a solution for this. Basically you can use base64 to encode the output of pickle.dumps() before plugging it into the database. You would then turn around and use base64 to decode the output of the database before passing it to pickle.loads().

My code now looks like this:

## Put the information into the database:
self.raw_data = base64.b64encode(pickle.dumps(data))

## Get the information out of the database:
return pickle.loads(base64.b64decode(self.raw_data))

Again, thank you @prometheus.


That's a known problem, and there was a discussion about this on the Python bug-tracker:

I ran into this problem today when writing python data structures into a database. Only ASCII is safe in this situation. I understood the Python docs that protocol 0 was ASCII-only.

I use pickle+base64 now, however, this makes debugging more difficult.

Anyway, I think that the docs should clearly say that protocol 0 is not ASCII-only because this is important in the Python world. For example, I saw this issue because Django makes an implicit unicode() conversion with my input which fails with non-ASCII.


I see no need to do so. Normally, it should be possible to store any binary data in a database.

A worse problem is that pickling is not safe - if the database could get its data from anywhere, it could get malicious pickling data.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜