What is the standard idiom for implementing equals and hashCode in Scala?
What is the standard idiom for implementing the equals
and hashCode
methods in Scala?
I know the preferred 开发者_开发技巧approach is discussed in Programming in Scala, but I don't currently have access to the book.
There's a free 1st edition of PinS that discuss this subject just as well. However, I think the best source is this article by Odersky discussing equality in Java. The discussion in PinS is, iirc, an abbreviated version of this article.
After doing quite a bit of research, I couldn't find an answer with an essential and correct implementation of the equals
and hashCode
pattern for a Scala class (not a case class as that is automatically compiler generated and should not be overridden). I did find a 2.10 (stale) macro making a valiant attempt at solving this.
I ended up combining the patterns recommended in "Effective Java, 2nd Edition" (by Joshua Bloch) and in this article "How to Write an Equality Method in Java" (by Martin Odersky, Lex Spoon, and Bill Venners), and have created a standard default pattern I now use to implement equals
and hashCode
for my Scala classes.
The primary goal of the equals
pattern is to minimize the number of actual comparison's required to execute to get to a valid and definitive true
or false
.
Additionally, the hashCode
method should ALWAYS be overridden and re-implemented when the equals
method is overridden (again, see "Effective Java, 2nd Edition" (by Joshua Bloch)). Hence, my inclusion of the hashCode
method "pattern" in the code below which also incorporates critical advice about using ##
instead of hashCode
in the actual implementation.
It is worth mentioning that each of super.equals
and super.hashCode
must be called only if an ancestor has already overridden it. If not, then it is imperative to NOT call super.*
as the default implementation in java.lang.Object
(equals
compares for the same class instance and hashCode
most likely converts the memory address of the object into an integer), both of which will break the specified equals
and hashCode
contract for the now overridden methods.
class Person(val name: String, val age: Int) extends Equals {
override def canEqual(that: Any): Boolean =
that.isInstanceOf[Person]
//Intentionally avoiding the call to super.equals because no ancestor has overridden equals (see note 7 below)
override def equals(that: Any): Boolean =
that match {
case person: Person =>
( (this eq person) //optional, but highly recommended sans very specific knowledge about this exact class implementation
|| ( person.canEqual(this) //optional only if this class is marked final
&& (hashCode == person.hashCode) //optional, exceptionally execution efficient if hashCode is cached, at an obvious space inefficiency tradeoff
&& ( (name == person.name)
&& (age == person.age)
)
)
)
case _ =>
false
}
//Intentionally avoiding the call to super.hashCode because no ancestor has overridden hashCode (see note 7 below)
override def hashCode(): Int =
31 * (
name.##
) + age.##
}
The code has a number of nuances which are critically important:
- Extending
scala.Equals
- Ensures theequals
idiomatic pattern, which includes thecanEqual
method formalization, is being fully implemented. While extending it is technically optional, it remains highly recommended. - Same instance short circuit - Testing
(this eq person)
fortrue
ensures no further (expensive) comparisons as it is literally the same instance. This test is required to be inside the pattern match as theeq
method is available onAnyRef
, not onAny
(the type ofthat
). And becauseAnyRef
is an ancestor ofPerson
, this technique is doing two simultaneous type validations by type validating the descendant,Person
, which implies an automatic type validation of all its ancestors, includingAnyRef
, which is required for theeq
check. While this test is technically optional, it remains highly recommended. - Check
that
'scanEqual
- It's very easy to get this backwards which is INCORRECT. It is crucial the check ofcanEqual
be executed on thethat
instance withthis
provided as the parameter. And while it might seem redundant to the pattern match (given we get to this line of code,that
must be aPerson
instance), we must still make the method call as we cannot assumethat
is an equals-compatible descendant ofPerson
(all descendants ofPerson
will successfully pattern match as aPerson
). If the class is markedfinal
, this test optional and may be safely removed. Otherwise, it is required. - Checking
hashCode
short circuit - While not sufficient nor required, if thishashCode
test isfalse
, it eliminates the need to perform all the value level checks (item 5). If this test istrue
, then a field by field check is actually required. This test is optional, and may be excluded if the hashCode value isn't cached and the total cost of the per-field equality checks is low enough. - Per-field equality checks - Even if the
hashCode
test is provided and succeeds, all of the field level values must still be checked. This is because, although it is highly improbable, it remains possible for two different instances to generate the exact samehashCode
value, and still not be actually equivalent at the field level. The parent'sequals
must also be invoked to ensure any additional field's defined in ancestors are also tested. - Pattern match
case _ =>
- This is actually achieving two different effects. First, Scala pattern match guarantees anull
is properly routed here sonull
doesn't have to appear anywhere within our pure Scala code. Secondly, the pattern match guarantees whateverthat
is, it isn't an instance ofPerson
or one of its descendants. - When to call each of
super.equals
andsuper.hashCode
is a bit tricky - If an ancestor has already overridden both (should never be either) of these, it is imperative you incorporatesuper.*
in your own overridden implementations. And if an ancestor has not overridden both, then your overridden implementations must avoid invokingsuper.*
. ThePerson
code example above shows the case where there is no ancestor who has overridden both. So, calling eachsuper.*
method call would incorrectly fall all the way to the defaultjava.lang.Object.*
implementation which would invalidate the assumed combined contract forequals
andhashCode
.
This is the super.equals
based code to use ONLY IF there is at least one ancestor who has explicitly overridden equals
already.
override def equals(that: Any): Boolean =
...
case person: Person =>
( ...
//WARNING: including the next line ASSUMES at least one ancestor has already overridden equals; i.e. that this does not end up invoking java.lang.Object.equals
&& ( super.equals(person) //incorporate checking ancestor(s)' fields
&& (name == person.name)
&& (age == person.age)
)
...
)
...
This is the super.hashCode
based code to use ONLY IF there is at least one ancestor who has explicitly overridden hashCode
already.
override def hashCode(): Int =
31 * (
31 * (
//WARNING: including the next line ASSUMES at least one ancestor has already overridden hashCode; i.e. that this does not end up invoking java.lang.Object.hashCode
super.hashCode //incorporate adding ancestor(s)' hashCode (and thereby, their fields)
) + name.##
) + age.##
One final note: In my doing research for this, I couldn't believe how many erred implementations of this pattern exist. It is clearly still an area where it's difficult to get the details right:
- Programming in Scala, First Edition - Missed 1, 2, and 4 above.
- Alvin Alexander's Scala Cookbook - Missed 1, 2, and 4.
- Code Examples for Programming in Scala - Incorrectly uses
.hashCode
instead of.##
on class fields when generating class'shashCode
override and implementation. See Tree3.scala
Yes, overriding equals
and hashCode
is a dauting task in both Java and Scala. I would recommend not to use equals
at all, and instead use a type class (Eq
/Eql
etc.). It's more typesafe (compiler error when comparing unrelated types), easier to implement (no overriding and class checking) and more flexible (you can write type class instances separate from the data class). Dotty uses a concept of "multiversal equality" which offers a choice between catching some obviously incorrect usages of equals
and strict equality checking a la Haskell.
精彩评论