subsetOf versus forall contains

2023-02-19 13:24 问答作者：

Consider I have:

case class X(...)
val xs: Seq[X] = ... // some method result
val ys: Seq[X] = ... // some other method result

While the following holds:

xs.distinct.sameElements(xs) // true
ys.distinct.sameElements(ys) // true

I am facing:

xs forall(ys contains _)    // true
xs.toSet subsetOf ys.toSet  // false

Why? I mean, it´s clear that making a Set out of a Seq chooses random elements in case of duplicates, but there are no duplicates because of "(...).distinct.sameElements(...)".

I certainly need a deeper understanding of the kind of equality check...

EDIT:

After a long search, I found the problem and conden开发者_JAVA百科sed it to the following:

My elements are not the same, however I must take a closer look why distinct.sameElements isn´t complaining. But meanwhile a new question arose:

Consider this:

val rnd = scala.util.Random
def int2Label(i: Int) = "[%4s]".format(Seq.fill(rnd.nextInt(4))(i).mkString)
val s = Seq(1,2,3,4)

// as expected :
val m1: Map[Int,String] = s.map(i => (i,int2Label(i))).toMap
println(m1) // Map(5 -> [ 555], 1 -> [    ], 2 -> [  22], 3 -> [    ])
println(m1) // Map(5 -> [ 555], 1 -> [    ], 2 -> [  22], 3 -> [    ])

// but accessing m2 several times yields different results. Why?
val m2: Map[Int,String] = s.map(i => (i,i)).toMap.mapValues { int2Label(_) }
println(m2) // Map(5 -> [   5], 1 -> [  11], 2 -> [  22], 3 -> [ 333])
println(m2) // Map(5 -> [  55], 1 -> [  11], 2 -> [    ], 3 -> [    ])

So my elements in my first to sequences aren´t the same because they depend on a m2-construct and so each time a accessing them they are different.

My new question is, why does m2 behave like a function in contrast to m1 although both are immutable maps. That isn´t intuitively for me.

The most common reasons for problems in this area--testing set equality and the like--are

hashCode does not agree with equals
Your values are not stable (so previous hashCode does not agree with current equals)

The reason is that this matters is that distinct and toSet use hash codes to build sets, whereas contains simply runs over the collection with an exists:

xs forall(ys contains _) == xs forall (x => ys exists (y => x==y) )

This is made more complicated by the fact that many sets don't start using hash codes until they're larger than some minimal size (usually 4), so you don't always notice this with testing. But let's prove it to ourselves:

class Liar(s: String) {
  override def equals(o: Any) = o match {
    case l: Liar => s == l.s
    case _ => _
  }
  // No hashCode override!
}
val strings = List("Many","song","lyrics","go","na","na","na","na")
val lies = strings.map(s => new Liar(s))
val truly_distinct = lies.take(5)
lies.length          // 8
lies.distinct.length // 8!
lies.toSet.size      // 8!
lies forall( truly_distinct contains _ )   // True, because it's true
lies.toSet subsetOf truly_distinct.toSet   // False--not even the same size!

Okay, so now we know that for most of these operations, matching up hashCode and equals is a Good Thing.

Warning: in Java, mismatches happens frequently even with primitives:

new java.lang.Float(1.0) == new java.lang.Integer(1)                       // True
(new java.lang.Float(1.0)).hashCode == (new java.lang.Integer(1)).hashCode // Uh-oh

but Scala now at least catches that (hopefully every time):

(new java.lang.Float(1.0)).## == (new java.lang.Integer(1)).##   // Whew

Case classes also do this properly, so we're left with three possibilities

You overrode equals but not hashCode to match
Your values are not stable
There is a bug and Java wrapped primitive hashCode mismatch is coming back to bite you

The first one is easy enough.

The second one seems to be your problem, and it arises from the fact that mapValues actually creates a view of the original collection, not a new collection. (filterKeys does this also.) Personally, I think this is a questionable choice of design, since normally when you have a view and you want to make a single concrete instance of it, you .force it. But default maps don't have a .force because they don't realize that they might be views. So you have to resort to things like

myMap.map{ case (k,v) => (k, /* something that produces a new v */) }
myMap.mapValues(v => /* something that produces a new v */).view.force
Map() ++ myMap.mapValues(v => /* something that produces a new v */)

This is really important if you're doing things like file IO to map your values (e.g. if your values are filenames and you're mapping to their contents) and you don't want to read the file over and over again.

But your case--where you're assigning random values--is another where it is important to pick a single copy, not recreate the values over and over.

继续阅读：scala

subsetOf versus forall contains

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？