开发者

How to find source of error in Python Pickle on massive object

I've taken over somebody's code for a fairly large project. I'm trying to save program state, and there's one massive object which stores pretty much all the other objects. I'm trying to pickle this object, but I get this error:

pickle.PicklingError: Can't pickle : it's not found as builtin.module

From what I can find on google, this is because somewhere I'm importing something outside of pyt开发者_开发技巧hon init, or that a class attribute is referencing a module. So, I've got a two questions:

  1. Can anybody confirm that that's why this error is being given? Am I looking for the right things in my code?

  2. Is there a way to find what line of code/object member is causing the difficulties in pickle? The traceback only gives the line in pickle where the error occurs, not the line of the object being pickled.


2) You can subclass pickle.Pickler and monkey-patch it to show a log of what it's pickling. This should make it easier to trace where the problem is.

import pickle
class MyPickler (pickle.Pickler):
    def save(self, obj):
        print 'pickling object', obj, 'of type', type(obj)
        pickle.Pickler.save(self, obj)

This will only work with the Python implementation of pickle.Pickler. In Python 3.x, the pickle module uses the C implementation by default, the pure-Python version of Pickler is called _Pickler.

# Python 3.x
import pickle
class MyPickler (pickle._Pickler):
    def save(self, obj):
        print ('pickling object  {0} of type {1}'.format(obj, type(obj))
        pickle._Pickler.save(self, obj)


Something like this exists in dill. Let's look at a list of objects, and see what we can do:

>>> import dill
>>> f = open('whatever', 'w')
>>> f.close()
>>> 
>>> l = [iter([1,2,3]), xrange(5), open('whatever', 'r'), lambda x:x]
>>> dill.detect.trace(False)
>>> dill.pickles(l)
False

Ok, dill fails to pickle the list. So what's the problem?

>>> dill.detect.trace(True)
>>> dill.pickles(l)
T4: <type 'listiterator'>
False

Ok, the first item in the list fails to pickle. What about the rest?

>>> map(dill.pickles, l)
T4: <type 'listiterator'>
Si: xrange(5)
F2: <function _eval_repr at 0x106991cf8>
Fi: <open file 'whatever', mode 'r' at 0x10699c810>
F2: <function _create_filehandle at 0x106991848>
B2: <built-in function open>
F1: <function <lambda> at 0x1069f6848>
F2: <function _create_function at 0x1069916e0>
Co: <code object <lambda> at 0x105a0acb0, file "<stdin>", line 1>
F2: <function _unmarshal at 0x106991578>
D1: <dict object at 0x10591d168>
D2: <dict object at 0x1069b1050>
[False, True, True, True]

Hm. The other objects pickle just fine. So, let's replace the first object.

>>> dill.detect.trace(False)
>>> l[0] = xrange(1,4)
>>> dill.pickles(l)
True
>>> _l = dill.loads(dill.dumps(l))

Now our object pickles. Well, we could be taking advantage of some built-in object sharing that happens for pickling on linux/unix/mac… so what about a stronger check, like actually pickling across a sub-process (like happens on windows)?

>>> dill.check(l)        
[xrange(1, 4), xrange(5), <open file 'whatever', mode 'r' at 0x107998810>, <function <lambda> at 0x1079ec410>]
>>> 

Nope, the list still works… so this is an object that could be sent to another process successfully.

Now, with regard to your error, which everyone seemed to ignore…

The ModuleType object is not pickleable, and that's causing your error.

>>> import types
>>> types.ModuleType 
<type 'module'>
>>>
>>> import pickle
>>> pickle.dumps(types.ModuleType)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 1374, in dumps
    Pickler(file, protocol).dump(obj)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 224, in dump
    self.save(obj)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 748, in save_global
    (obj, module, name))
pickle.PicklingError: Can't pickle <type 'module'>: it's not found as __builtin__.module

However, if we import dill, it magically works.

>>> import dill
>>> pickle.dumps(types.ModuleType)
"cdill.dill\n_load_type\np0\n(S'ModuleType'\np1\ntp2\nRp3\n."
>>> 


As a quick-and-dirty way to find what attribute/member of the object is causing the problem, you could try:

for k, v in massiveobject.__dict__.iteritems():
    print k
    pickle.dumps(v)


1) There's a slight difference from what you've found. This is a problem caused by some variable (class attribute, list or dict item, it could be anything) that is referencing the module type (not a module directly). This code should reproduce the issue:

import pickle
pickle.dumps(type(pickle))
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜