How to save a dictionary containing utf-8 characters as its keys to a file with cPickle Python?
I want to know How to save a dictionary containing utf-8 characters as its keys to a file in Python with cPickle
? this dictionary is very large and I've heard that cPickle
is much faster than pickle
. Also I suppose having utf-8 encoded keys is also problematic.
Any other fast solutions are also welcome.
here is what I do and below is the error message:
unique_ngrams_dict = defaultdict(lambda: 0)# just to show how I defined my dict
dict_file = codecs.open('ngram_dict', 'w', 'utf-8')
cPickle.dump(unique_ngrams_dict,dict_file)
dict_file.close()
error message:
Traceback (most recent call last):
File "Generate_NGram.py", line 81, in <module>
save_ngram_dict(unique_ngrams_dict)
File "Generate_NGram.py", line 70, in save_ngram_dict
cPickle.dump(unique_ngrams_dict,dict_file)
File "/usr/lib/python2.6/copy_reg.py", line 70, in _reduce_ex
开发者_JAVA百科 raise TypeError, "can't pickle %s objects" % base.__name__
TypeError: can't pickle function objects
thanks
Pickle is a binary format, so you shouldn't open the file with any codecs, just:
file('ngram_dict', 'w')
It's not a reason it's failing, just quite inefficient.
The actual problem is the object you are trying to save contains a function reference (the default value
lambda: 0
) and pickle format does not support serializing functions.You'll have three options:
- Use a regular
dict
and use it's.get
method with default argument. Set
unique_ngrams_dict.default_factory = None
before pickling and set it back to
unique_ngrams_dict.default_factory = lambda: 0
after unpickling.
Define a class like:
class NgramDefault: def __call__(): return 0
and use
NgramDefault()
as the default factory instead oflambda: 0
.
- Use a regular
You should just do it and trust the pickle module to do the right thing. The best way to treat pickle is as an opaque blob of stuff that will magically re-create the exact data structure you started with when you unpickle it.
Don't try to apply any sort of encoding to the output of pickle, it should be treated as a binary blob. If you have unicode elements when you pickle, they will be unicode once you unpickle.
精彩评论