开发者

Python pickle - how does it break?

Everyone knows pickle is not a secure way to store user data. It even says so on the box.

I'm looking for examples of strings or data structures that break pickle parsing in the current supported versions of cPython >= 2.4. Are there things that can be pickled but not unpickled? Are there problems with particular unicode characters? Really big data structures? Obviously the old ASCII protocol has some issues, but what about the most current binary form?

I'm particularly curious about ways in which the pickle loads operation can fail, especially when given a string produced by pickle itself. Are there any circumstances in which pickle will continue parsing past the .?

What sort of edge cases are there?

Edit: Here are some examples of the sort of thing I'm looking for:

  • In Python 2.4, you can pickle an array without error, but you can't unpickle it. http://bugs.python.org/issue1281383
  • You can't reliably pickle objects that inherit from dict and call __setitem__ before instance variables are set with __setstate__. This can be a gotcha when pickling Cookie objects. See http://bugs.python.org/issue964868 and http://bugs.python.org/issue826897
  • Python 2.4 (and 2.5?) will return a pickle value for infinity (or values close to it like 1e100000), but may (depending on platform) fail when loading. See http://bugs.python.org/issue880990 and http://bugs.python.org/issue445484
  • This last item is interesting because it reveals a case where the STOP开发者_如何学Python marker does not actually stop parsing - when the marker exists as part of a literal, or more generally, when not preceded by a newline.


This is a greatly simplified example of what pickle didn't like about my data structure.

import cPickle as pickle

class Member(object):
    def __init__(self, key):
        self.key = key
        self.pool = None
    def __hash__(self):
        return self.key

class Pool(object):
    def __init__(self):
        self.members = set()
    def add_member(self, member):
        self.members.add(member)
        member.pool = self

member = Member(1)
pool = Pool()
pool.add_member(member)

with open("test.pkl", "w") as f:
    pickle.dump(member, f, pickle.HIGHEST_PROTOCOL)

with open("test.pkl", "r") as f:
    x = pickle.load(f)

Pickle is known to be a little funny with circular structures, but if you toss custom hash functions and sets/dicts into the mix then things get quite hairy.

In this particular example it partially unpickles the member and then encounters the pool. So it then partially unpickles the pool and encounters the members set. So it creates the set and tries to add the partially unpickled member to the set. At which point it dies in the custom hash function, because the member is only partially unpickled. I dread to think what might happen if you had an "if hasattr..." in the hash function.

$ python --version
Python 2.6.5
$ python test.py
Traceback (most recent call last):
  File "test.py", line 25, in <module>
    x = pickle.load(f)
  File "test.py", line 8, in __hash__
    return self.key
AttributeError: ("'Member' object has no attribute 'key'", <type 'set'>, ([<__main__.Member object at 0xb76cdaac>],))


If you are interested in how things fail with pickle (or cPickle, as it's just a slightly different import), you can use this growing list of all the different object types in python to test against fairly easily.

https://github.com/uqfoundation/dill/blob/master/dill/_objects.py

The package dill includes functions that discover how an object fails to pickle, for example by catching the error it throws and returning it to the user.

dill.dill has these functions, which you could also build for pickle or cPickle, simply with a cut-and-paste and an import pickle or import cPickle as pickle (or import dill as pickle):

def copy(obj, *args, **kwds):
    """use pickling to 'copy' an object"""
    return loads(dumps(obj, *args, **kwds))


# quick sanity checking
def pickles(obj,exact=False,safe=False,**kwds):
    """quick check if object pickles with dill"""
    if safe: exceptions = (Exception,) # RuntimeError, ValueError
    else:
        exceptions = (TypeError, AssertionError, PicklingError, UnpicklingError)
    try:
        pik = copy(obj, **kwds)
        try:
            result = bool(pik.all() == obj.all())
        except AttributeError:
            result = pik == obj
        if result: return True
        if not exact:
            return type(pik) == type(obj)
        return False
    except exceptions:
        return False

and includes these in dill.detect:

def baditems(obj, exact=False, safe=False): #XXX: obj=globals() ?
    """get items in object that fail to pickle"""
    if not hasattr(obj,'__iter__'): # is not iterable
        return [j for j in (badobjects(obj,0,exact,safe),) if j is not None]
    obj = obj.values() if getattr(obj,'values',None) else obj
    _obj = [] # can't use a set, as items may be unhashable
    [_obj.append(badobjects(i,0,exact,safe)) for i in obj if i not in _obj]
    return [j for j in _obj if j is not None]


def badobjects(obj, depth=0, exact=False, safe=False):
    """get objects that fail to pickle"""
    if not depth:
        if pickles(obj,exact,safe): return None
        return obj
    return dict(((attr, badobjects(getattr(obj,attr),depth-1,exact,safe)) \
           for attr in dir(obj) if not pickles(getattr(obj,attr),exact,safe)))

def badtypes(obj, depth=0, exact=False, safe=False):
    """get types for objects that fail to pickle"""
    if not depth:
        if pickles(obj,exact,safe): return None
        return type(obj)
    return dict(((attr, badtypes(getattr(obj,attr),depth-1,exact,safe)) \
           for attr in dir(obj) if not pickles(getattr(obj,attr),exact,safe)))

and this last function, which is what you can use to test the objects in dill._objects

def errors(obj, depth=0, exact=False, safe=False):
    """get errors for objects that fail to pickle"""
    if not depth:
        try:
            pik = copy(obj)
            if exact:
                assert pik == obj, \
                    "Unpickling produces %s instead of %s" % (pik,obj)
            assert type(pik) == type(obj), \
                "Unpickling produces %s instead of %s" % (type(pik),type(obj))
            return None
        except Exception:
            import sys
            return sys.exc_info()[1]
    return dict(((attr, errors(getattr(obj,attr),depth-1,exact,safe)) \
           for attr in dir(obj) if not pickles(getattr(obj,attr),exact,safe)))


It is possible to pickle class instances. If I knew what classes your application uses, then I could subvert them. A contrived example:

import subprocess

class Command(object):
    def __init__(self, command):
        self._command = self._sanitize(command)

    @staticmethod
    def _sanitize(command):
        return filter(lambda c: c in string.letters, command)

    def run(self):
        subprocess.call('/usr/lib/myprog/%s' % self._command, shell=True)

Now if your program creates Command instances and saves them using pickle, and I could subvert or inject into that storage, then I could run any command I choose by setting self._command directly.

In practice my example should never pass for secure code anyway. But note that if the sanitize function is secure, then so is the entire class, apart from the possible use of pickle from untrusted data breaking this. Therefore, there exist programs which are secure but can be made insecure by the inappropriate use of pickle.

The danger is that your pickle-using code could be subverted along the same principle but in innocent-looking code where the vulnerability is far less obvious. The best thing to do is to always avoid using pickle to load untrusted data.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜