subsetOf versus forall contains
Consider I have:
case class X(...)
val xs: Seq[X] = ... // some method result
val ys: Seq[X] = ... // some other method result
While the following holds:
xs.distinct.sameElements(xs) // true
ys.distinct.sameElements(ys) // true
I am facing:
xs forall(ys contains _) // true
xs.toSet subsetOf ys.toSet // false
Why? I mean, it´s clear that making a Set
out of a Seq
chooses random elements in case of duplicates, but there are no duplicates because of "(...).distinct.sameElements(...)
".
I certainly need a deeper understanding of the kind of equality check...
EDIT:
After a long search, I found the problem and conden开发者_JAVA百科sed it to the following:
My elements are not the same, however I must take a closer look why distinct.sameElements
isn´t complaining. But meanwhile a new question arose:
Consider this:
val rnd = scala.util.Random
def int2Label(i: Int) = "[%4s]".format(Seq.fill(rnd.nextInt(4))(i).mkString)
val s = Seq(1,2,3,4)
// as expected :
val m1: Map[Int,String] = s.map(i => (i,int2Label(i))).toMap
println(m1) // Map(5 -> [ 555], 1 -> [ ], 2 -> [ 22], 3 -> [ ])
println(m1) // Map(5 -> [ 555], 1 -> [ ], 2 -> [ 22], 3 -> [ ])
// but accessing m2 several times yields different results. Why?
val m2: Map[Int,String] = s.map(i => (i,i)).toMap.mapValues { int2Label(_) }
println(m2) // Map(5 -> [ 5], 1 -> [ 11], 2 -> [ 22], 3 -> [ 333])
println(m2) // Map(5 -> [ 55], 1 -> [ 11], 2 -> [ ], 3 -> [ ])
So my elements in my first to sequences aren´t the same because they depend on a m2
-construct and so each time a accessing them they are different.
My new question is, why does m2
behave like a function in contrast to m1
although both are immutable maps. That isn´t intuitively for me.
The most common reasons for problems in this area--testing set equality and the like--are
hashCode
does not agree withequals
- Your values are not stable (so previous
hashCode
does not agree with currentequals
)
The reason is that this matters is that distinct
and toSet
use hash codes to build sets, whereas contains
simply runs over the collection with an exists
:
xs forall(ys contains _) == xs forall (x => ys exists (y => x==y) )
This is made more complicated by the fact that many sets don't start using hash codes until they're larger than some minimal size (usually 4), so you don't always notice this with testing. But let's prove it to ourselves:
class Liar(s: String) {
override def equals(o: Any) = o match {
case l: Liar => s == l.s
case _ => _
}
// No hashCode override!
}
val strings = List("Many","song","lyrics","go","na","na","na","na")
val lies = strings.map(s => new Liar(s))
val truly_distinct = lies.take(5)
lies.length // 8
lies.distinct.length // 8!
lies.toSet.size // 8!
lies forall( truly_distinct contains _ ) // True, because it's true
lies.toSet subsetOf truly_distinct.toSet // False--not even the same size!
Okay, so now we know that for most of these operations, matching up hashCode
and equals
is a Good Thing.
Warning: in Java, mismatches happens frequently even with primitives:
new java.lang.Float(1.0) == new java.lang.Integer(1) // True
(new java.lang.Float(1.0)).hashCode == (new java.lang.Integer(1)).hashCode // Uh-oh
but Scala now at least catches that (hopefully every time):
(new java.lang.Float(1.0)).## == (new java.lang.Integer(1)).## // Whew
Case classes also do this properly, so we're left with three possibilities
- You overrode
equals
but nothashCode
to match - Your values are not stable
- There is a bug and Java wrapped primitive hashCode mismatch is coming back to bite you
The first one is easy enough.
The second one seems to be your problem, and it arises from the fact that mapValues
actually creates a view of the original collection, not a new collection. (filterKeys
does this also.) Personally, I think this is a questionable choice of design, since normally when you have a view and you want to make a single concrete instance of it, you .force
it. But default maps don't have a .force
because they don't realize that they might be views. So you have to resort to things like
myMap.map{ case (k,v) => (k, /* something that produces a new v */) }
myMap.mapValues(v => /* something that produces a new v */).view.force
Map() ++ myMap.mapValues(v => /* something that produces a new v */)
This is really important if you're doing things like file IO to map your values (e.g. if your values are filenames and you're mapping to their contents) and you don't want to read the file over and over again.
But your case--where you're assigning random values--is another where it is important to pick a single copy, not recreate the values over and over.
精彩评论