开发者

Hash for unordered set?

I am trying to solve a one-way indentity problem, a group of authors want to publish something without reveal their own real username, so are there algorithm/library for hashing an unordered set of usernames?

Some people would suggest, s开发者_开发百科ort the set alphabetically first, then join, finally hash, but that's not ideal solution for dynamic growing array.

Additionaly questions (not compulsory for the main question):

  1. If such algorithm exists, can we verify if a username is one of the authors by hash?
  2. If we already know the hash of a group of usernames, then there is a new author added, can we get a new hash without knowing previous author usernames?


Are you willing to accept a small probability of false positives, that is of names that aren't authors which will be incorrectly identified as authors if anyone checks? (The probability can be made arbitrarily small.)

If you are, then a bloom filter would fit the bill perfectly.


You can always generate a hash, regardless of whether or not you know the other authors' user names. You can't guarantee that it's a unique hash, though.

If you know all the user names in advance, you can generate a minimal perfect hash, but any time you add a user name you'll have to generate a completely new hash table--with different hashes. That's obviously not a good solution.

It depends on what you want your final keys to look like.

One possibility is to assign unique sequential IDs to the user names and then obfuscate those ids so that they don't look like sequential IDs. This is similar to what YouTube does with their IDs--they turn a 64-bit number into an 11-character base64 string. I wrote a little article about that, with code in C#. Check out http://www.informit.com/guides/content.aspx?g=dotnet&seqNum=839.

And, yes, the process is reversible.


It sounds like a single hash won't do you any good. 1. You can't verify that a single username is in the hash; you would need to know all the usernames. 2. You can't add a new user to the hash without knowing something about the unhashed usernames (the order in which you add users to the hash will matter, for all good hash algorithms).

For #2, a partial solution is that you would not keep all the usernames, just keep something like an XOR of all the existing users. When you want to add a new user, XOR it with the existing one and re-hash the result. Then it won't matter which order you added the users in.

But the real solution, I think, is just to have a set of hashes, rather than a hash of a set. Is there a reason you can't do this? Then you can easily keep the set ordered or unordered as you wish, you can easily add users to the set, and easily check to see if a given author is already in the set.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜