开发者

Cumulative string collection hashing

is there an algorithm available for Java that will allow me to keep adding String objects and removing old ones such that if I added a String and then removed it later, the integer hash would be the same?

Edit: the strings in the hash are unique.

Some pseudocode:

h = hash
add(h, "hi!") == 51;
add(h, "hello again!") == 532;
rem(h, 开发者_开发知识库"hello again!") == 51;

I know you can do it using Java collections, but the default implementations have to keep going over the entire collection to collect the hashcodes. This is really inefficient for large collections. I don't mind using an external library if one exists.

Thanks in advance,

Chris


If you don't care about the hash algorithm being of cryptographic quality (cryptographic hash algorithms are very difficult to specify correctly; you mess up and someone can cause a collision when you don't want them to), the following should work:

Consider the following code:

interface Accumulator<T, U>
{
    public void add(T t);
    public void subtract(T t);
    public U get();
}

class SumHasher implements Accumulator<String,Integer>
{
    @Override private int accumulator = 0;
    @Override public void add(String t) { accumulator += t.hashCode(); }
    @Override public void subtract(String t) { accumulator -= t.hashCode(); }
    @Override public Integer get() { return accumulator; }
}

class XorHasher implements Accumulator<String,Integer>
{
    @Override private int accumulator = 0;
    @Override public void add(String t) { accumulator ^= t.hashCode(); }
    @Override public void subtract(String t) { accumulator ^= t.hashCode(); }
    @Override public Integer get() { return accumulator; }
}

What these have in common is that addition and XOR are both operations that are associative and have inverses. You can perform them in any order and undo them in any order, so that if you add() for each element in a Set<T> and then subtract() for each element in the set (not necessarily in the same order), you are guaranteed to get 0.

There are certainly other operations which satisfy this property, but I'm not sure what they are. (Multiplication won't work unless you can guarantee none of the items accumulated has a value of 0. This answer used to use f(x,h) = ((x^h) + h)^h and g(x,h) = ((x^h) - h)^h as inverses, but these functions aren't associative: accumulating elements in different orders give different results.

Edit: I did think of one other simple one: bitwise permutation (of which bitwise rotation is a special case) based on an input value. In Java, you could implement bitwise rotation using (x << k) | (x >>> (32-k)) where x is an integer and k is an integer between 0 and 31 (e.g. take any arbitrary 5 bits from another number). The >>> is not a typo: you need to use it because the regular >> does sign-extension. Oops, that works only if the elements in the set are removed in reverse order.

Edit 2: Finally, you could implement this approach more generally as follows:

abstract class AbstractHashCodeAccumulator<T> implements Accumulator<T, Integer>
{
    private int accumulator = 0;
    abstract protected int combine(int a, int h);
    abstract protected int uncombine(int a, int h);
    @Override public void add(T t) { accumulator = combine(accumulator, t.hashCode());
    @Override public void subtract(T t) { accumulator = uncombine(accumulator, t.hashCode());
    @Override public Integer get() { return accumulator; }
}

class SumHasher extends AbstractHashCodeAccumulator<String>
{
    @Override protected int combine(int a, int h)   { return a+h; }
    @Override protected int uncombine(int a, int h) { return a-h; }
}

class XorHasher extends AbstractHashCodeAccumulator<String>
{
    @Override protected int combine(int a, int h)   { return a^h; }
    @Override protected int uncombine(int a, int h) { return a^h; }
}

The problem with this approach is that in some ways it is "un-hashlike", namely it requires an orderliness whereas hashing generally requires disorder/entropy/irreversability.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜