Python, checksum of a dict

2023-03-26 00:48 问答作者：

I'm thinking to create a checksum of a dict to know if it was modified or not For the moment i have that:

>>> import hashlib
>>> import pickle
>>> d = {'k': 'v', 'k2': 'v2'}
>>> z = pickle.dumps(d)
>>> hashlib.md5(z).hexdigest()
'8521955ed8c63c554744058c9888dc30'

Perhaps a better solution exists?

Note: I want t开发者_运维问答o create an unique id of a dict to create a good Etag.

EDIT: I can have abstract data in the dict.

Something like this:

reduce(lambda x,y : x^y, [hash(item) for item in d.items()])

Take the hash of each (key, value) tuple in the dict and XOR them alltogether.

@katrielalex If the dict contains unhashable items you could do this:

hash(str(d))

or maybe even better

hash(repr(d))

In Python 3, the hash function is initialized with a random number, which is different for each python session. If that is not acceptable for the intended application, use e.g. zlib.adler32 to build the checksum for a dict:

import zlib

d={'key1':'value1','key2':'value2'}
checksum=0
for item in d.items():
    c1 = 1
    for t in item:
        c1 = zlib.adler32(bytes(repr(t),'utf-8'), c1)
    checksum=checksum ^ c1

print(checksum)

I would recommend an approach very similar to the one your propose, but with some extra guarantees:

import hashlib, json
hashlib.md5(json.dumps(d, sort_keys=True, ensure_ascii=True).encode('utf-8')).hexdigest()

sort_keys=True: keep the same hash if the order of your keys changes
ensure_ascii=True: in case you have some non-ascii characters, to make sure the representation does not change

We use this for our ETag.

I don't know whether pickle guarantees you that the hash is serialized the same way every time.

If you only have dictionaries, I would go for o combination of calls to keys(), sorted(), build a string based on the sorted key/value pairs and compute the checksum on that

I think you may not realise some of the subtleties that go into this. The first problem is that the order that items appear in a dict is not defined by the implementation. This means that simply asking for str of a dict doesn't work, because you could have

str(d1) == "{'a':1, 'b':2}"
str(d2) == "{'b':2, 'a':1}"

and these will hash to different values. If you have only hashable items in the dict, you can hash them and then join up their hashes, as @Bart does or simply

hash(tuple(sorted(hash(x) for x in d.items())))

Note the sorted, because you have to ensure that the hashed tuple comes out in the same order irrespective of which order the items appear in the dict. If you have dicts in the dict, you could recurse this, but it will be complicated.

BUT it would be easy to break any implementation like this if you allow arbitrary data in the dictionary, since you can simply write an object with a broken __hash__ implementation and use that. And you can't use id, because then you might have equal items which compare different.

The moral of the story is that hashing dicts isn't supported in Python for a reason.

As you said, you wanted to generate an Etag based on the dictionary content, OrderedDict which preserves the order of the dictionary may be better candidate here. Just iterator through the key,value pairs and construct your Etag string.

继续阅读：checksum python

Python, checksum of a dict

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？