开发者

recursively convert Map[Int, Map[Int, X]] to Array[Array[X]]

I am trying to write a function that converts between Maps with integer keys to the corresponding arrays. I have the base case done, but am trying to write the recursive case (i.e. multi-dimensional arrays: converting Map[Int, Map[Int, X]] to Array[Array[X]]).

This task arose out of the need to construct an array from a stream without knowing how large the array will be in advance, allowing for the possibility of elements to come off the stream in random order and the possibility of duplicate elements coming off the stream.

I have a function that does it:

def toArrayHard[X:ClassManifest](x:scala.collection.Map[Int, X]):Array[X] =
{
    if (x.size == 0) new Array(0)
    else 
    {
        val max:Int = 1 + x.keys.max

        val a:Array[X] = new Array(max)

        var i = 0
        while (i < max)
        {
            a(i) = x(i)
            i += 1
        }
        a
    }
}

Note, I'm aware that the code fails if the map contains key k but doesn't contain key i where 0 <= i < k. This is okay for my purposes.

Now I'm looking to do the same for arbitrarily deep multi-dimensional arrays. F开发者_C百科or example, converting between Map[Int, Map[Int, X]] to Array[Array[X]]. Unfortunately, I am getting tripped up by the types. Using the above as a base case, here's what I have so far:

def toArrayHardRec[X:ClassManifest](x:scala.collection.Map[Int, X]):Array[X] =
{
    import scala.collection.Map

    if (x.size == 0) new Array(0)
    else 
    {
        x match
        {
            case t:Map[Int, Map[Int, Y]] forSome { type Y } =>
            {
                val f0 = t.mapValues{m => toArrayHardRec[Map[Int, Y]](m)}
                toArrayHard(f0)
            }
            case _ => toArrayHard(x)
        }
    }
}

This is the error that I get:

'=>' expected but 'forSome' found.

Since this is an educational pursuit, any feedback is greatly appreciated. Specifically, I would appreciate any code critiques of my awfully java-looking code, existing scala functions that do the same thing, or suggestions for an alternative way to build these arrays.


This is meaningless:

case t:Map[Int, Map[Int, Y]] forSome { type Y }

because of erasure, all you can check for is this:

case t: Map[_, _]

which, in fact, is the only way you won't get warnings about erasure. Also, an existential type is almost always unnecessary, and, personally, I always find its syntax a bit tricky to get right. Still, this is the one case where using _ to signify existential type is adequate. Note, however, that in my own code (the first version) I can't use it, because the types won't match if I do.

Next,

arbitrarily deep multi-dimensional arrays

That's not a particular good idea. You must know what type you'll return to declare it -- if the deepness is "arbitrary", then so is the type. You can get away with a Array[AnyRef], but it will be painful to use.

Still, here is one way. I did away with all the loops, replacing them with map. Note where I take advantage of the fact that a Map is also a Function1, by calling map m. Note also that I just use map over an array (created with toArray) to avoid having to manage map creation.

def deMap(m: Map[Int, AnyRef]): Array[AnyRef] = {
  def translate(x: AnyRef): AnyRef = x match {
    case map: Map[Int, AnyRef] => deMap(map)
    case s: String => s
    case other => throw new IllegalArgumentException("Expected Map or String, got "+other.getClass.toString)
  }
  m.keys.toArray.sorted map m map translate
}

There are two problems. To begin with, if the keys are not contiguous (Map(0 -> "a", 2 -> "b"), for example), the elements will be out of position. Here's a way around it, replacing the next to last line of code with:

  Array.range(0, m.keys.max + 1) map (m getOrElse (_, "")) map translate

Here I used an empty string to stand for any hole in the arrays.

The other problem is that I assume all maps will be of type Map[Int, AnyRef]. To fix this, the could would have to be rewritten like this:

def deMap(m: Map[Int, AnyRef]): Array[AnyRef] = {
  def translate(x: AnyRef): AnyRef = x match {
    case map: Map[_, _] => 
      val validMap = map collect { case (key: Int, value: AnyRef) => (key -> value) }
      deMap(validMap)
    case s: String => s
    case other => throw new IllegalArgumentException("Expected Map or String, got "+other.getClass.toString)
  }
  Array.range(0, m.keys.max + 1) map (m getOrElse (_, "")) map translate
}

Here I discard all pairs whose keys aren't Int and values aren't AnyRef. One could further safe-check the code to test if something's missing and report and error in that case:

def deMap(m: Map[Int, AnyRef]): Array[AnyRef] = {
  def translate(x: AnyRef): AnyRef = x match {
    case map: Map[_, _] => 
      val validMap = map collect { case (key: Int, value: AnyRef) => (key -> value) }
      if (map.size > validMap.size) {
        val wrongPairs = map.toSeq diff validMap.toSeq
        throw new IllegalArgumentException("Invalid key/value pairs found: "+wrongPairs)
      }
      deMap(validMap)
    case s: String => s
    case other => throw new IllegalArgumentException("Expected Map or String, got "+other.getClass.toString)
  }
  Array.range(0, m.keys.max + 1) map (m getOrElse (_, "")) map translate
}

Finally, about performance. Using map the way I did means a new array will be allocated for each map. Some people take that to mean I have to iterate three times instead of one, but since each time I do a different task, it isn't really doing much more work. However, the fact that I allocate a new array each time definitely has performance impact -- both because of the allocation penalty (Java pre-initializes all arrays), and because of cache locality concerns.

One way to avoid that is using a view. When you do map, flatMap and filter over a view, you get a new view back, with that operation stored for future use (other methods might work that way too -- check the documentation, or test when in doubt). When you finally do an apply on the view object, it will apply all the operations necessary to get the specific element you have asked for. It will do so every time you apply for that element too, so performance can be both better or worse.

Here I'll start with a Range view, to avoid array allocation, and then convert the view to an Array at the end. Still, keys will create a set, imposing some overhead. After this I'll explain how to avoid that.

def deMap(m: Map[Int, AnyRef]): Array[AnyRef] = {
  def translate(x: AnyRef): AnyRef = x match {
    case map: Map[_, _] => 
      val validMap = map collect { case (key: Int, value: AnyRef) => (key -> value) }
      if (map.size > validMap.size) {
        val wrongPairs = map.toSeq diff validMap.toSeq
        throw new IllegalArgumentException("Invalid key/value pairs found: "+wrongPairs)
      }
      deMap(validMap)
    case s: String => s
    case other => throw new IllegalArgumentException("Expected Map or String, got "+other.getClass.toString)
  }
  (0 to m.keys.max view) map (m getOrElse (_, "")) map translate toArray
}

You should leave view to necessary optimizations, however, instead of pro-actively using them. It isn't necessarily faster than normal collections, and its non-strictness can be tricky. Another alternative to using view is using a Stream instead. A Stream is much like a List, except it only computes it's elements on demand. That means none of the map operations will be applied until necessary. To use it, just replace the next to last line to this:

  Stream.range(0, m.keys.max + 1) map (m getOrElse (_, "")) map translate toArray

The main advantage of a Stream over a view is that once a value in a Stream has been computed, it stays computed. That is also it's main disadvantage over a view, ironically enough. In this particular case, I think view is faster.

Finally, about doing a max without computing a Set (the result of keys) first. A Map is also an Iterable, and all Iterable have a max method, so you can simply do m.max. That, however, will compute the max of Tuple2[Int, AnyRef], a type for which there is no Ordering. However, max receives an implicit parameter telling it what Ordering to use, which can be thus provided:

m.max(Ordering by ((_: (Int, AnyRef))._1))

This is a bit of a mouthful, so we can define the implicit we need and use it, err, implicitly. :-)

def deMap(m: Map[Int, AnyRef]): Array[AnyRef] = {
  implicit val myOrdering = Ordering by ((_: (Int, AnyRef))._1)
  def translate(x: AnyRef): AnyRef = x match {
    case map: Map[_, _] => 
      val validMap = map collect { case (key: Int, value: AnyRef) => (key -> value) }
      if (map.size > validMap.size) {
        val wrongPairs = map.toSeq diff validMap.toSeq
        throw new IllegalArgumentException("Invalid key/value pairs found: "+wrongPairs)
      }
      deMap(validMap)
    case s: String => s
    case other => throw new IllegalArgumentException("Expected Map or String, got "+other.getClass.toString)
  }
  (0 to m.max._1 view) map (m getOrElse (_, "")) map translate toArray
}

Note that max returns a tuple, key and value, so we need _1 to get the key. Also, note that we create an Ordering object at each nesting of deMap. Not bad, but it could be made better by defining it elsewhere.

Once you have done all that, a while loop will still be faster, though I have no idea by how much. Given enough JVM optimizations, they might come pretty close. On the other hand, if you really want speed, you could go all the way to assembler code. It comes down to a balance between how easy and fast it is to write the code, how easy it is to maintain it, and how fast it runs.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜