Why is only the suffix of work_index hashed?
I'm reading through the PDF that Brett Slatkin has published for Google I/O 2010:
"Data pipelines with Google App Engine".开发者_Python百科In this video (the Fan-in part) Brett says that the work_index has to be a hash, so that 'you distribute the load across the BigTable'... and this is how work_index is created:
work_index = '%s-%d' % (sum_name, knuth_hash(index))
...which I guess creates something like 'mySum-54657651321987'
I do understand the basic idea, but is why only one half of work_index is hashed? Is it important to hash only part of it leaving the suffix out? Would it be wrong to do
md5('%s-%d' % (sum_name, index)) so that the hash would be like '6gw8....hq6'
?
I'm Java guy so I would use md5 to hash, which means I get id like 'mySum' + 32 characters. (Obviously I want my ids/keys to be as short as possible here.) If I could hash the whole string my id would be just 32 chars.
Or would you suggest to use something else to do the hashing with?
Brett Slatkin's own explanation
加载中,请稍侯......
精彩评论