开发者

What is the standard idiom for implementing equals and hashCode in Scala?

What is the standard idiom for implementing the equals and hashCode methods in Scala?

I know the preferred 开发者_开发技巧approach is discussed in Programming in Scala, but I don't currently have access to the book.


There's a free 1st edition of PinS that discuss this subject just as well. However, I think the best source is this article by Odersky discussing equality in Java. The discussion in PinS is, iirc, an abbreviated version of this article.


After doing quite a bit of research, I couldn't find an answer with an essential and correct implementation of the equals and hashCode pattern for a Scala class (not a case class as that is automatically compiler generated and should not be overridden). I did find a 2.10 (stale) macro making a valiant attempt at solving this.

I ended up combining the patterns recommended in "Effective Java, 2nd Edition" (by Joshua Bloch) and in this article "How to Write an Equality Method in Java" (by Martin Odersky, Lex Spoon, and Bill Venners), and have created a standard default pattern I now use to implement equals and hashCode for my Scala classes.

The primary goal of the equals pattern is to minimize the number of actual comparison's required to execute to get to a valid and definitive true or false.

Additionally, the hashCode method should ALWAYS be overridden and re-implemented when the equals method is overridden (again, see "Effective Java, 2nd Edition" (by Joshua Bloch)). Hence, my inclusion of the hashCode method "pattern" in the code below which also incorporates critical advice about using ## instead of hashCode in the actual implementation.

It is worth mentioning that each of super.equals and super.hashCode must be called only if an ancestor has already overridden it. If not, then it is imperative to NOT call super.* as the default implementation in java.lang.Object (equals compares for the same class instance and hashCode most likely converts the memory address of the object into an integer), both of which will break the specified equals and hashCode contract for the now overridden methods.

class Person(val name: String, val age: Int) extends Equals {
  override def canEqual(that: Any): Boolean =
    that.isInstanceOf[Person]

  //Intentionally avoiding the call to super.equals because no ancestor has overridden equals (see note 7 below)
  override def equals(that: Any): Boolean =
    that match {
      case person: Person =>
        (     (this eq person)                     //optional, but highly recommended sans very specific knowledge about this exact class implementation
          ||  (     person.canEqual(this)          //optional only if this class is marked final
                &&  (hashCode == person.hashCode)  //optional, exceptionally execution efficient if hashCode is cached, at an obvious space inefficiency tradeoff
                &&  (     (name == person.name)
                      &&  (age == person.age)
                    )
              )
        )
      case _ =>
        false
    }

  //Intentionally avoiding the call to super.hashCode because no ancestor has overridden hashCode (see note 7 below)
  override def hashCode(): Int =
    31 * (
      name.##
    ) + age.##
}

The code has a number of nuances which are critically important:

  1. Extending scala.Equals - Ensures the equals idiomatic pattern, which includes the canEqual method formalization, is being fully implemented. While extending it is technically optional, it remains highly recommended.
  2. Same instance short circuit - Testing (this eq person) for true ensures no further (expensive) comparisons as it is literally the same instance. This test is required to be inside the pattern match as the eq method is available on AnyRef, not on Any (the type of that). And because AnyRef is an ancestor of Person, this technique is doing two simultaneous type validations by type validating the descendant, Person, which implies an automatic type validation of all its ancestors, including AnyRef, which is required for the eq check. While this test is technically optional, it remains highly recommended.
  3. Check that's canEqual - It's very easy to get this backwards which is INCORRECT. It is crucial the check of canEqual be executed on the that instance with this provided as the parameter. And while it might seem redundant to the pattern match (given we get to this line of code, that must be a Person instance), we must still make the method call as we cannot assume that is an equals-compatible descendant of Person (all descendants of Person will successfully pattern match as a Person). If the class is marked final, this test optional and may be safely removed. Otherwise, it is required.
  4. Checking hashCode short circuit - While not sufficient nor required, if this hashCode test is false, it eliminates the need to perform all the value level checks (item 5). If this test is true, then a field by field check is actually required. This test is optional, and may be excluded if the hashCode value isn't cached and the total cost of the per-field equality checks is low enough.
  5. Per-field equality checks - Even if the hashCode test is provided and succeeds, all of the field level values must still be checked. This is because, although it is highly improbable, it remains possible for two different instances to generate the exact same hashCode value, and still not be actually equivalent at the field level. The parent's equals must also be invoked to ensure any additional field's defined in ancestors are also tested.
  6. Pattern match case _ => - This is actually achieving two different effects. First, Scala pattern match guarantees a null is properly routed here so null doesn't have to appear anywhere within our pure Scala code. Secondly, the pattern match guarantees whatever that is, it isn't an instance of Person or one of its descendants.
  7. When to call each of super.equals and super.hashCode is a bit tricky - If an ancestor has already overridden both (should never be either) of these, it is imperative you incorporate super.* in your own overridden implementations. And if an ancestor has not overridden both, then your overridden implementations must avoid invoking super.*. The Person code example above shows the case where there is no ancestor who has overridden both. So, calling each super.* method call would incorrectly fall all the way to the default java.lang.Object.* implementation which would invalidate the assumed combined contract for equals and hashCode.

This is the super.equals based code to use ONLY IF there is at least one ancestor who has explicitly overridden equals already.

override def equals(that: Any): Boolean =
  ...
    case person: Person =>
      ( ...
                //WARNING: including the next line ASSUMES at least one ancestor has already overridden equals; i.e. that this does not end up invoking java.lang.Object.equals
                &&  (     super.equals(person)     //incorporate checking ancestor(s)' fields
                      &&  (name == person.name)
                      &&  (age == person.age)
                )
            ...
      )
    ...

This is the super.hashCode based code to use ONLY IF there is at least one ancestor who has explicitly overridden hashCode already.

override def hashCode(): Int =
  31 * (
    31 * (
      //WARNING: including the next line ASSUMES at least one ancestor has already overridden hashCode; i.e. that this does not end up invoking java.lang.Object.hashCode
      super.hashCode  //incorporate adding ancestor(s)' hashCode (and thereby, their fields)
    ) + name.##
  ) + age.##

One final note: In my doing research for this, I couldn't believe how many erred implementations of this pattern exist. It is clearly still an area where it's difficult to get the details right:

  1. Programming in Scala, First Edition - Missed 1, 2, and 4 above.
  2. Alvin Alexander's Scala Cookbook - Missed 1, 2, and 4.
  3. Code Examples for Programming in Scala - Incorrectly uses .hashCode instead of .## on class fields when generating class's hashCode override and implementation. See Tree3.scala


Yes, overriding equals and hashCode is a dauting task in both Java and Scala. I would recommend not to use equals at all, and instead use a type class (Eq/Eql etc.). It's more typesafe (compiler error when comparing unrelated types), easier to implement (no overriding and class checking) and more flexible (you can write type class instances separate from the data class). Dotty uses a concept of "multiversal equality" which offers a choice between catching some obviously incorrect usages of equals and strict equality checking a la Haskell.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜