开发者

How to Pickle a python dictionary into MySQL?

I looked through several SO-Questions for how to pickle a python object and store it into a database. The information I collected is:

  • import pickle or import cpickle. Import the latter, if performance is an issue.
  • Assume dict is a python dictionary (or what so ever python object): pickled = pickle.dumps(dict).
  • store pickled into a MySQL BLOB Column using what so ever module to communicate with Database.
  • Get it out again. And use pickle.loads(pickled) to restore the python dictionary.

I just want to make sure I understood this right. Did I miss something critical? Are there sideeffects? Is it r开发者_开发技巧eally that easy?

Background-Info: The only thing I want to do, is store Googlegeocoder-Responses, which are nested python dictionarys in my case. I am only using a little part of the response object and I don't know if I will ever need more of it later on. That's why I thought of storing the response to save me repetition of some million querys.


It's really that easy... so long as you don't need your DB to know anything about the dictionary. If you need any sort of structured data access to the contents of the dictionary, then you're going to have to get more involved.

Another gotcha might be what you intend to put in the dict. Python's pickle serialization is quite intelligent and can handle most cases without any need for adding custom support. However, when it doesn't work, it can be very difficult to understand what's gone wrong. So if you can, restrict the contents of the dict to Python's built-in types. If you start adding instances of custom classes, keep them to simple custom classes that don't do any funny stuff with attribute storage or access. And beware of adding instances of classes or types from add-ons. In general, if you start running into hard-to-understand problems with the pickling or unpickling, look at the non-built-in types in the dict.


If speed is really important, I just ran a test of loading a large python dictionary (35MB) from a pickle vs SELECTING from a MySql table with all keys and values stored in rows:

Pickle Method:

import time, pickle
t1 = time.clock()
f = open('story_data.pickle','rb')
s = pickle.load(f)
print time.clock() - t1

MySQL Method:

import database as db
t1 = time.clock()
data,msg = db.mysql(""" SELECT id,story from story_data;""")
data_dict = dict([(int(x),y.split(',')) for x,y in data])
print time.clock() - t1

Output: pickle method: 32.0785171704 mysql method: 3.25916336479

If a ten-fold speed enhancement is enough, the structure of the database probably doesn't matter. Note I am splitting all the comma separated data into lists as the values for 36,000 keys and it still only takes 3 seconds. So I've switched away from using pickles for large data sets, as the rest of the 400 line program I was using took about 3 seconds, and the pickle loading took 32 seconds.

Also note:

cPickle works just like pickle and is over 50% faster.

Don't try to pickle a class full of dictionaries and save in mysql: It doesn't reconstitute itself correctly, at least it didn't for me.


If you have nested dictionaries, you have to be careful. Most python objects don't pickle (and you can stuff any object as a value in a dict). Worse still, even less python objects can be converted to strings and stored in SQL.

However, if you use klepto, the serialization and storage in a database is pretty transparent, and works for most python objects.

Let's build some typical python objects in a dict (or dicts):

>>> class Foo(object):                                 
...   def bar(self, x):
...     return self.y + x
...   y = 1
... 
>>> d1 = {'a': min, 'b': lambda x:x**2, 'c': [1,2,3], 'd': Foo()}
>>> f = Foo(); f.y = 100
>>> d2 = {'a': max, 'b': lambda x:x**3, 'c': [2,1,3], 'd': f}

Now, let's build a nested dict, and dump to a MYSQL archive.

>>> import klepto
>>> a = klepto.archives.sql_archive('mysql://user:pass@localhost/foo', dict={'d1':d1, 'd2':d2})
>>> a.dump()

Now, we delete our interface to the archive… and build a new one. The load loads all the objects into memory.

>>> del a
>>> b = klepto.archives.sql_archive('mysql://user:pass@localhost/foo')
>>> b.load()

We now access the objects in the in-memory copies.

>>> b['d1']
{'a': <built-in function min>, 'c': [1, 2, 3], 'b': <function <lambda> at 0x1037ccd70>, 'd': <__main__.Foo object at 0x103938ed0>}
>>> b['d1']['b'](b['d1']['d'].bar(1))
4
>>> b['d2']['b'](b['d2']['d'].bar(1))
1030301
>>> 

We quit python… and then start up a new session. This time, we decide to use cached=False, so we will interact with the database directly.

dude@hilbert>$ python
Python 2.7.10 (default, May 25 2015, 13:16:30) 
[GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.40)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import klepto
>>> b = klepto.archives.sql_archive('mysql://user:pass@localhost/foo', cached=False)
>>> b['d2']['b'](b['d2']['d'].bar(1))
1030301
>>> b['d1']['b'](b['d1']['d'].bar(1))
4
>>> 

klepto leverages sqlalchemy, so it works across several database backends… and additionally, provides the same dict-based interface to storage on disk (in a file or a directory).

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜