开发者

Un-overiding hashCode

I have the following situation: I have many BSTs, and I want to merge isomorphic subtrees to save space.

I am hashing Binary Search Tree nodes into a "unique table" - basically a hash of BST nodes.

Nodes that have the same left and right child and the same key have the same hash code, and I have overridden equals for the node class appropriately.

Everything works, except that computing the hash is expensive - it involves computing the hash for the child nodes.

I would like to cache the hashed value for a node. The problem I have is the natural way of doing this, a HashMap from nodes to integers, will itself call the hash function on the nodes.

I've gotten around this by declaring a new field in the nodes, which I use to store the hash code. However, I feel this is not the right solution.

What I really want is to to map nodes to their hash codes using a hash which uses the node's address. I thought I could do this by making HashMap, and casting the nodes to object, which would then invoke the hashCode method on objects, but this didn't work (inserts into the hash still call the node hash and equality functions.

I would appreciate insight into the best way of implementing the node to hash code cache. I've attached code below illustrating what's going on below.

import java.util.Set;
import java.util.HashSet;
import java.util.Map;
import java.util.HashMap;

class Bst {

  int key;
  String name;
  Bst left;
  Bst right;

  public Bst( int k, String name, Bst l, Bst r ) {
    this.key = k;
    this.name = name;
    this.left = l;
    this.right = r;
  }

  public String toString() {
    String l = "";
    String r = "";
    if ( left != null ) {
      l = left.toString();
    }
    if ( right != null ) {
      r = right.toString();
    }
    return key + ":" + name + ":" + l + ":" + r;
  }

 @Override
  public boolean equals( Object o ) {
    System.out.println("calling Bst's equals");
    if ( o == null ) {
      return false;
    }
    if ( !(o instanceof Bst) ) {
      return false;
    }
    Bst n = (Bst) o;

    if ( n == null || n.key != key ) {
      return false;
    } else if ( n.left != null && left == null || n.right != null && right == null ||
                n.left == null & left != null || n.right == null && right != null ) {
      return false;
    } else if ( n.left != null && n.right == null ) {
      return n.left.equals( left );
    } else if ( n.left != null && n.right != null ) {
      return n.left.equals( left ) && n.right.equals( right );
    } else if ( n.left == null && n.right != null ) {
      return n.right.equals( right );
    } else {
      return true;
    }
  }

  @Override
  public int hashC开发者_运维问答ode() {
    // the real hash function is more complex, entails
    // calling hashCode on children if they are not null
    System.out.println("calling Bst's hashCode");
    return key;
  }
}

public class Hashing {

  static void p(String s) { System.out.println(s); }

  public static void main( String [] args ) {
    Set<Bst> aSet = new HashSet<Bst>();
    Bst a = new Bst(1, "a", null, null );
    Bst b = new Bst(2, "b", null, null );
    Bst c = new Bst(3, "c", null, null );
    Bst d = new Bst(1, "d", null, null );

    a.left = b;
    a.right = c;
    d.left = b;
    d.right = c;

    aSet.add( a );
    if ( aSet.contains( d ) ) {
      p("d is a member of aSet");
    } else {
      p("d is a not member of aSet");
    }

    if ( a.equals( d ) ) {
      p("a and d are equal");
    } else {
      p("a and d are not equal");
    }

    // now try casts to objects to avoid calling Bst's HashCode and equals
    Set<Object> bSet = new HashSet<Object>();
    Object foo = new Bst( a.key, a.name, a.left, a.right );
    Object bar = new Bst( a.key, a.name, a.left, a.right );
    bSet.add( foo );
    p("added foo");
   if ( bSet.contains( bar ) ) {
      p("bar is a member of bSet");
    } else {
      p("bar is a not member of bSet");
    }
  }
}


Storing the hash in a field in the node feels like exactly the right solution to me. It's also what java.lang.String uses for its own hash code. Aside from anything else, it means that you can't possibly end up with cache entries for objects which can otherwise be collected, etc.

If you really want the value of hashCode that would be returned by the implementation in Object, you can use System.identityHashCode though. You shouldn't rely on this - or any other hash code - being unique though.

One other point: your tree is mutable at the moment by virtue of the fields being package access. If you cache the hash code the first time you call it, you won't "notice" if it would have changed due to fields changing. Basically you shouldn't change a node after you've used its hash code.


Java's built-in IdentityHashMap does what you're describing.

That said, Jon Skeet's answer sounds more like the right way to go.


storing the hash in a field can actually be equivalent to "caching" the value so that it does not have to be recomputed too frequently.

It's not necessarily a bad practice, but you have to make sure that you are clearing/recomputing it correctly whenever there is a change, which can be daunting if you have to notify of a change up or down a complex graph or tree.

If you want to use a hash code computed by the JVM (roughly based on the "RAM address" of the object, even if it's value is implementation specific), you can use System.identityHashCode(x), which does exactly that, and exactly what Object.hashCode does.


What I really want is to to map nodes to their hash codes using a hash which uses the node's address.

What do you mean by the node's address? There is no such concept in Java, and there is no unique identifier for objects that I know of, like the physical address in non VM based languages e.g. C++. References in Java are not memory addresses, and objects may be relocated in memory anytime by the GC.

I thought I could do this by making HashMap, and casting the nodes to object, which would then invoke the hashCode method on objects, but this didn't work

Indeed, since hashCode is virtual, and is overridden in your node class, so always the subclass implementation will be called, regardless of the static type of the reference you have.

I am afraid any attempt to use a map to cache hash values bumps into the same chicken and egg problem, that - as you mention - the map needs the hash value itself first.

I don't see any better way than caching the hash values within the nodes as you did. You need to ensure though that the cached values are invalidated whenever the child nodes change. Wrong - as Jon's answer points out, changing the hashcode of an object after it is stored in a map breaks the map's internal integrity, so it must not happen.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜