开发者

Algorithm to get unique and same hashcode for the object when we run the application multiple times

I m using Java.I want to know,is any algorithm is available that will give me an unique and the same hash code when I will run the application multiple times sop that collisions of hash code will be avoided.

I know the thing that for similar objects, jvm returns same hash code and for different objects it may return same or different hash code.Bt I want some logic that will help to generate generate unique hash code for every object.

unique means开发者_开发技巧 that hash code of one object should not collide with any other object's hash code.and same means when i will run the application multiple times ,it should return me the same hash code whatever it returned me previously


The default hash code function in Java might return different hash codes for each JVM invokation, because it is able to use the memory address of the object, mangle it, and return it.

This is however not good coding practice, since objects which are equal should always return the same hashcode! Please read about the hash code contract to learn more. And most Classes in Java already have a hashcode function implemented that returns the same value on each JVM invocation.

To make it simple: All your data holding objects which might be stored in some collection should have an equals and hashcode implemention. If you code with Eclipse or any other reasonable IDE, you can use a wizard that creates the functions automatically.

And while we are at it: It is IMHO good practice to also implement the Comparable<T> interface, so you can use the objects within SortedSets and TreeMaps, too.

While we are at it: If others should your objects, don't forget Serializable and Cloneable.


Unique means that hashcode of one object should not collide with any other object's hashcode. Same means when I run the application multiple times, it should return me the same hash code whatever it returned me previously.

It is impossible to meet these requirements for a number of reasons:

  • It is not possible to guarantee that hashcodes are unique. Whatever you do in your classes hashcode method, some other classes hashcode method may give a value for some instance that is the same as the hashcode of one of your instances.

  • It is impossible to guarantee that hashcodes are unique across application runs even just for instances of your class.

The second requires justification. The way to create a unique hashcode is to do something like this:

    static HashSet<Integer> usedCodes = ...
    static IdentityHashMap<YourClass, Integer> codeMap = ...

    public int hashcode() {
        Integer code = codeMap.get(this);
        if (code == null) {
            code = // generate value-based hashcode for 'this'
            while (usedCode.contains(code)) {
                code = rehash(code);
            }
            usedCodes.add(code);
            codeMap.put(this, code);
        }
        return code;
    }

This gives the hashcodes with the desired uniqueness property, but the sameness property is not guaranteed ... unless the application always generates / accesses the hashcodes for all objects in the same order.

The only way to get this to work would be to persist the usedCode and codeMap data structures in a suitable form. Even (just) storing the unique hashcodes as part of the persisted objects is not sufficient, because there is a risk that the application may reissue a hashcode to a newly created object before reading the existing object that has the hashcode.

Finally, it should be noted that you have to be careful with using identity hashcodes anywhere in the solution. Identity hashcodes are not unique across different runs of an application. Indeed, if there are differences in any inputs, or if there is any non-determinism, it is highly likely that a given object will have a different identity hashcode value each time you run the application.

FOLLOW UP

Suppose you are storing millions of urls in database. While retrieving these urls, I want to generate unique hashcode that will make searching faster.

You need to store the hashcodes in a separate column of the table. But given the constraints discussed above, I don't see how this is going to make search faster. Basically you have to search the database for the URL in order to work out its unique hashcode.

I think you are better off using hashcodes that are not unique with a small probability. If you use a good enough "cryptographic" hashing function and a large enough hash size you can (in theory) make the probability of collision arbitrarily small ... but not zero.


Based on my understanding of your question...

If it is your custom object, then you can override the hashcode method(along with equals) to get a consistent hashcode based on the instance variables of your class. You can even return a constant hashcode, it will still satisfy the hascode contract.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜