开发者

How do I flatten a nested For Comprehension that uses I/O?

I am having trouble flattening a nested For Generator into a single For Generator.

I created MapSerializer to save and load Maps.

Listing of MapSerializer.scala:

import java.io.{ObjectInputStream, ObjectOutputStream}

object MapSerializer {
  def loadMap(in: ObjectInputStream): Map[String, IndexedSeq[Int]] =
    (for (_ <- 1 to in.readInt()) yield {
      val key = in.readUTF()
  开发者_如何学编程    for (_ <- 1 to in.readInt()) yield {
        val value = in.readInt()
        (key, value)
      }
    }).flatten.groupBy(_ _1).mapValues(_ map(_ _2))

  def saveMap(out: ObjectOutputStream, map: Map[String, Seq[Int]]) {
    out.writeInt(map size)
    for ((key, values) <- map) {
      out.writeUTF(key)
      out.writeInt(values size)
      values.foreach(out.writeInt(_))
    }
  }
}

Modifying loadMap to assign key within the generator causes it to fail:

def loadMap(in: ObjectInputStream): Map[String, IndexedSeq[Int]] =
  (for (_ <- 1 to in.readInt();
        key = in.readUTF()) yield {
    for (_ <- 1 to in.readInt()) yield {
      val value = in.readInt()
      (key, value)
    }
  }).flatten.groupBy(_ _1).mapValues(_ map(_ _2))

Here is the stacktrace I get:

java.io.UTFDataFormatException
    at java.io.ObjectInputStream$BlockDataInputStream.readWholeUTFSpan(ObjectInputStream.java)
    at java.io.ObjectInputStream$BlockDataInputStream.readOpUTFSpan(ObjectInputStream.java)
    at java.io.ObjectInputStream$BlockDataInputStream.readWholeUTFSpan(ObjectInputStream.java)
    at java.io.ObjectInputStream$BlockDataInputStream.readUTFBody(ObjectInputStream.java)
    at java.io.ObjectInputStream$BlockDataInputStream.readUTF(ObjectInputStream.java:2819)
    at java.io.ObjectInputStream.readUTF(ObjectInputStream.java:1050)
    at MapSerializer$$anonfun$loadMap$1.apply(MapSerializer.scala:8)
    at MapSerializer$$anonfun$loadMap$1.apply(MapSerializer.scala:7)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:194)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:194)
    at scala.collection.immutable.Range.foreach(Range.scala:76)
    at scala.collection.TraversableLike$class.map(TraversableLike.scala:194)
    at scala.collection.immutable.Range.map(Range.scala:43)
    at MapSerializer$.loadMap(MapSerializer.scala:7)

I would like to flatten the loading code to a single For Comprehension, but I get errors that suggest that it is either executing in a different order or repeating steps I am not expecting it to repeat.

Why is it that moving the assignment of key into the generator causes it to fail?

Can I flatten this into a single generator? If so, what would that generator be?


Thank you for self contained compiling code in your question. I don't think you want to flatten the loops as the structure is not flat. You then need to use groupBy to recover the structure. Also if you have "zero -> Seq()" as an element of the map, it would be lost. Using this simple map avoids the groupBy and preserves the elements mapped to empty sequences:

def loadMap(in: ObjectInputStream): Map[String, IndexedSeq[Int]] = {
  val size = in.readInt
  (1 to size).map{ _ =>
    val key = in.readUTF
    val nval = in.readInt
    key -> (1 to nval).map(_ => in.readInt)
  }(collection.breakOut)
}

I use breakOut to generate the right type as otherwise I think the compilers complains about generic Map and immutable Map mismatch. You can also use Map() ++ (...).

Note: I arrived at this solution by being confused by your for loop and starting to rewrite using as flatMap and map:

val tuples = (1 to size).flatMap{ _ =>
  val key = in.readUTF
  println("key " + key)
  val nval = in.readInt
  (1 to nval).map(_ => key -> in.readInt)
}

I think in the for loop, something happens when you don't use some of the generator. I though this would be equivalent to:

val tuples = for {
  _ <- 1 to size
  key = in.readUTF
  nval = in.readInt
  _ <- 1 to nval
  value = in.readInt
} yield { key -> value }

But this is not the case, so I think I'm missing something in the translation.

Edit: figured out what's wrong with a single for loop. Short story: the translation of definitions within for loops caused the key = in.readUTF statement to be called consecutively before the inner loop is executed. To work around this, use view and force:

val tuples = (for {
  _ <- (1 to size).view
  key = in.readUTF
  nval = in.readInt
  _ <- 1 to nval
  value = in.readInt
} yield { key -> value }).force

The issue can be demonstrated more clearly with this piece of code:

val iter = Iterator.from(1)
val tuple = for {
  _ <- 1 to 3
  outer = iter.next
  _ <- 1 to 3
  inner = iter.next
} yield (outer, inner)

It returns Vector((1,4), (1,5), (1,6), (2,7), (2,8), (2,9), (3,10), (3,11), (3,12)) which shows that all outer values are evaluated before inner values. This is due to the fact that it is more or less translated to something like:

for { 
  (i, outer) <- for (i <- (1 to 3)) yield (i, iter.next)
  _ <- 1 to 3
 inner = iter.next
} yield (outer, inner)

This computes all outer iter.next first. Going back to the original use case, all in.readUTF values would be called consecutively before in.readInt.


Here is the compacted version of @huynhjl's answer that I eventually deployed:

def loadMap(in: ObjectInputStream): Map[String, IndexedSeq[Int]] =
  ((1 to in.readInt()) map { _ =>
    in.readUTF() -> ((1 to in.readInt()) map { _ => in.readInt()) }
  })(collection.breakOut)

The advantage of this version is that there are no direct assignments.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜