Python pickle - how does it break?
Everyone knows pickle is not a secure way to store user data. It even says so on the box.
I'm looking for examples of strings or data structures that break pickle parsing in the current supported versions of cPython >= 2.4
. Are there things that can be pickled but not unpickled? Are there problems with particular unicode characters? Really big data structures? Obviously the old ASCII protocol has some issues, but what about the most current binary form?
I'm particularly curious about ways in which the pickle loads
operation can fail, especially when given a string produced by pickle itself. Are there any circumstances in which pickle will continue parsing past the .
?
What sort of edge cases are there?
Edit: Here are some examples of the sort of thing I'm looking for:
- In Python 2.4, you can pickle an array without error, but you can't unpickle it. http://bugs.python.org/issue1281383
- You can't reliably pickle objects that inherit from dict and call
__setitem__
before instance variables are set with__setstate__
. This can be a gotcha when pickling Cookie objects. See http://bugs.python.org/issue964868 and http://bugs.python.org/issue826897 - Python 2.4 (and 2.5?) will return a pickle value for infinity (or values close to it like 1e100000), but may (depending on platform) fail when loading. See http://bugs.python.org/issue880990 and http://bugs.python.org/issue445484
- This last item is interesting because it reveals a case where the
STOP
开发者_如何学Python marker does not actually stop parsing - when the marker exists as part of a literal, or more generally, when not preceded by a newline.
This is a greatly simplified example of what pickle didn't like about my data structure.
import cPickle as pickle
class Member(object):
def __init__(self, key):
self.key = key
self.pool = None
def __hash__(self):
return self.key
class Pool(object):
def __init__(self):
self.members = set()
def add_member(self, member):
self.members.add(member)
member.pool = self
member = Member(1)
pool = Pool()
pool.add_member(member)
with open("test.pkl", "w") as f:
pickle.dump(member, f, pickle.HIGHEST_PROTOCOL)
with open("test.pkl", "r") as f:
x = pickle.load(f)
Pickle is known to be a little funny with circular structures, but if you toss custom hash functions and sets/dicts into the mix then things get quite hairy.
In this particular example it partially unpickles the member and then encounters the pool. So it then partially unpickles the pool and encounters the members set. So it creates the set and tries to add the partially unpickled member to the set. At which point it dies in the custom hash function, because the member is only partially unpickled. I dread to think what might happen if you had an "if hasattr..." in the hash function.
$ python --version
Python 2.6.5
$ python test.py
Traceback (most recent call last):
File "test.py", line 25, in <module>
x = pickle.load(f)
File "test.py", line 8, in __hash__
return self.key
AttributeError: ("'Member' object has no attribute 'key'", <type 'set'>, ([<__main__.Member object at 0xb76cdaac>],))
If you are interested in how things fail with pickle
(or cPickle
, as it's just a slightly different import), you can use this growing list of all the different object types in python to test against fairly easily.
https://github.com/uqfoundation/dill/blob/master/dill/_objects.py
The package dill
includes functions that discover how an object fails to pickle, for example by catching the error it throws and returning it to the user.
dill.dill
has these functions, which you could also build for pickle
or cPickle
, simply with a cut-and-paste and an import pickle
or import cPickle as pickle
(or import dill as pickle
):
def copy(obj, *args, **kwds):
"""use pickling to 'copy' an object"""
return loads(dumps(obj, *args, **kwds))
# quick sanity checking
def pickles(obj,exact=False,safe=False,**kwds):
"""quick check if object pickles with dill"""
if safe: exceptions = (Exception,) # RuntimeError, ValueError
else:
exceptions = (TypeError, AssertionError, PicklingError, UnpicklingError)
try:
pik = copy(obj, **kwds)
try:
result = bool(pik.all() == obj.all())
except AttributeError:
result = pik == obj
if result: return True
if not exact:
return type(pik) == type(obj)
return False
except exceptions:
return False
and includes these in dill.detect
:
def baditems(obj, exact=False, safe=False): #XXX: obj=globals() ?
"""get items in object that fail to pickle"""
if not hasattr(obj,'__iter__'): # is not iterable
return [j for j in (badobjects(obj,0,exact,safe),) if j is not None]
obj = obj.values() if getattr(obj,'values',None) else obj
_obj = [] # can't use a set, as items may be unhashable
[_obj.append(badobjects(i,0,exact,safe)) for i in obj if i not in _obj]
return [j for j in _obj if j is not None]
def badobjects(obj, depth=0, exact=False, safe=False):
"""get objects that fail to pickle"""
if not depth:
if pickles(obj,exact,safe): return None
return obj
return dict(((attr, badobjects(getattr(obj,attr),depth-1,exact,safe)) \
for attr in dir(obj) if not pickles(getattr(obj,attr),exact,safe)))
def badtypes(obj, depth=0, exact=False, safe=False):
"""get types for objects that fail to pickle"""
if not depth:
if pickles(obj,exact,safe): return None
return type(obj)
return dict(((attr, badtypes(getattr(obj,attr),depth-1,exact,safe)) \
for attr in dir(obj) if not pickles(getattr(obj,attr),exact,safe)))
and this last function, which is what you can use to test the objects in dill._objects
def errors(obj, depth=0, exact=False, safe=False):
"""get errors for objects that fail to pickle"""
if not depth:
try:
pik = copy(obj)
if exact:
assert pik == obj, \
"Unpickling produces %s instead of %s" % (pik,obj)
assert type(pik) == type(obj), \
"Unpickling produces %s instead of %s" % (type(pik),type(obj))
return None
except Exception:
import sys
return sys.exc_info()[1]
return dict(((attr, errors(getattr(obj,attr),depth-1,exact,safe)) \
for attr in dir(obj) if not pickles(getattr(obj,attr),exact,safe)))
It is possible to pickle class instances. If I knew what classes your application uses, then I could subvert them. A contrived example:
import subprocess
class Command(object):
def __init__(self, command):
self._command = self._sanitize(command)
@staticmethod
def _sanitize(command):
return filter(lambda c: c in string.letters, command)
def run(self):
subprocess.call('/usr/lib/myprog/%s' % self._command, shell=True)
Now if your program creates Command
instances and saves them using pickle, and I could subvert or inject into that storage, then I could run any command I choose by setting self._command
directly.
In practice my example should never pass for secure code anyway. But note that if the sanitize
function is secure, then so is the entire class, apart from the possible use of pickle from untrusted data breaking this. Therefore, there exist programs which are secure but can be made insecure by the inappropriate use of pickle.
The danger is that your pickle-using code could be subverted along the same principle but in innocent-looking code where the vulnerability is far less obvious. The best thing to do is to always avoid using pickle to load untrusted data.
精彩评论