recursively convert Map[Int, Map[Int, X]] to Array[Array[X]]
I am trying to write a function that converts between Maps with integer keys to the corresponding arrays. I have the base case done, but am trying to write the recursive case (i.e. multi-dimensional arrays: converting Map[Int, Map[Int, X]] to Array[Array[X]]).
This task arose out of the need to construct an array from a stream without knowing how large the array will be in advance, allowing for the possibility of elements to come off the stream in random order and the possibility of duplicate elements coming off the stream.
I have a function that does it:
def toArrayHard[X:ClassManifest](x:scala.collection.Map[Int, X]):Array[X] =
{
if (x.size == 0) new Array(0)
else
{
val max:Int = 1 + x.keys.max
val a:Array[X] = new Array(max)
var i = 0
while (i < max)
{
a(i) = x(i)
i += 1
}
a
}
}
Note, I'm aware that the code fails if the map contains key k but doesn't contain key i where 0 <= i < k. This is okay for my purposes.
Now I'm looking to do the same for arbitrarily deep multi-dimensional arrays. F开发者_C百科or example, converting between Map[Int, Map[Int, X]] to Array[Array[X]]. Unfortunately, I am getting tripped up by the types. Using the above as a base case, here's what I have so far:
def toArrayHardRec[X:ClassManifest](x:scala.collection.Map[Int, X]):Array[X] =
{
import scala.collection.Map
if (x.size == 0) new Array(0)
else
{
x match
{
case t:Map[Int, Map[Int, Y]] forSome { type Y } =>
{
val f0 = t.mapValues{m => toArrayHardRec[Map[Int, Y]](m)}
toArrayHard(f0)
}
case _ => toArrayHard(x)
}
}
}
This is the error that I get:
'=>' expected but 'forSome' found.
Since this is an educational pursuit, any feedback is greatly appreciated. Specifically, I would appreciate any code critiques of my awfully java-looking code, existing scala functions that do the same thing, or suggestions for an alternative way to build these arrays.
This is meaningless:
case t:Map[Int, Map[Int, Y]] forSome { type Y }
because of erasure, all you can check for is this:
case t: Map[_, _]
which, in fact, is the only way you won't get warnings about erasure. Also, an existential type is almost always unnecessary, and, personally, I always find its syntax a bit tricky to get right. Still, this is the one case where using _
to signify existential type is adequate. Note, however, that in my own code (the first version) I can't use it, because the types won't match if I do.
Next,
arbitrarily deep multi-dimensional arrays
That's not a particular good idea. You must know what type you'll return to declare it -- if the deepness is "arbitrary", then so is the type. You can get away with a Array[AnyRef]
, but it will be painful to use.
Still, here is one way. I did away with all the loops, replacing them with map. Note where I take advantage of the fact that a Map
is also a Function1
, by calling map m
. Note also that I just use map
over an array (created with toArray
) to avoid having to manage map creation.
def deMap(m: Map[Int, AnyRef]): Array[AnyRef] = {
def translate(x: AnyRef): AnyRef = x match {
case map: Map[Int, AnyRef] => deMap(map)
case s: String => s
case other => throw new IllegalArgumentException("Expected Map or String, got "+other.getClass.toString)
}
m.keys.toArray.sorted map m map translate
}
There are two problems. To begin with, if the keys are not contiguous (Map(0 -> "a", 2 -> "b")
, for example), the elements will be out of position. Here's a way around it, replacing the next to last line of code with:
Array.range(0, m.keys.max + 1) map (m getOrElse (_, "")) map translate
Here I used an empty string to stand for any hole in the arrays.
The other problem is that I assume all maps will be of type Map[Int, AnyRef]
. To fix this, the could would have to be rewritten like this:
def deMap(m: Map[Int, AnyRef]): Array[AnyRef] = {
def translate(x: AnyRef): AnyRef = x match {
case map: Map[_, _] =>
val validMap = map collect { case (key: Int, value: AnyRef) => (key -> value) }
deMap(validMap)
case s: String => s
case other => throw new IllegalArgumentException("Expected Map or String, got "+other.getClass.toString)
}
Array.range(0, m.keys.max + 1) map (m getOrElse (_, "")) map translate
}
Here I discard all pairs whose keys aren't Int
and values aren't AnyRef
. One could further safe-check the code to test if something's missing and report and error in that case:
def deMap(m: Map[Int, AnyRef]): Array[AnyRef] = {
def translate(x: AnyRef): AnyRef = x match {
case map: Map[_, _] =>
val validMap = map collect { case (key: Int, value: AnyRef) => (key -> value) }
if (map.size > validMap.size) {
val wrongPairs = map.toSeq diff validMap.toSeq
throw new IllegalArgumentException("Invalid key/value pairs found: "+wrongPairs)
}
deMap(validMap)
case s: String => s
case other => throw new IllegalArgumentException("Expected Map or String, got "+other.getClass.toString)
}
Array.range(0, m.keys.max + 1) map (m getOrElse (_, "")) map translate
}
Finally, about performance. Using map
the way I did means a new array will be allocated for each map
. Some people take that to mean I have to iterate three times instead of one, but since each time I do a different task, it isn't really doing much more work. However, the fact that I allocate a new array each time definitely has performance impact -- both because of the allocation penalty (Java pre-initializes all arrays), and because of cache locality concerns.
One way to avoid that is using a view
. When you do map
, flatMap
and filter
over a view
, you get a new view back, with that operation stored for future use (other methods might work that way too -- check the documentation, or test when in doubt). When you finally do an apply
on the view
object, it will apply all the operations necessary to get the specific element you have asked for. It will do so every time you apply
for that element too, so performance can be both better or worse.
Here I'll start with a Range
view, to avoid array allocation, and then convert the view to an Array
at the end. Still, keys
will create a set, imposing some overhead. After this I'll explain how to avoid that.
def deMap(m: Map[Int, AnyRef]): Array[AnyRef] = {
def translate(x: AnyRef): AnyRef = x match {
case map: Map[_, _] =>
val validMap = map collect { case (key: Int, value: AnyRef) => (key -> value) }
if (map.size > validMap.size) {
val wrongPairs = map.toSeq diff validMap.toSeq
throw new IllegalArgumentException("Invalid key/value pairs found: "+wrongPairs)
}
deMap(validMap)
case s: String => s
case other => throw new IllegalArgumentException("Expected Map or String, got "+other.getClass.toString)
}
(0 to m.keys.max view) map (m getOrElse (_, "")) map translate toArray
}
You should leave view
to necessary optimizations, however, instead of pro-actively using them. It isn't necessarily faster than normal collections, and its non-strictness can be tricky. Another alternative to using view
is using a Stream
instead. A Stream
is much like a List
, except it only computes it's elements on demand. That means none of the map
operations will be applied until necessary. To use it, just replace the next to last line to this:
Stream.range(0, m.keys.max + 1) map (m getOrElse (_, "")) map translate toArray
The main advantage of a Stream
over a view
is that once a value in a Stream
has been computed, it stays computed. That is also it's main disadvantage over a view
, ironically enough. In this particular case, I think view
is faster.
Finally, about doing a max
without computing a Set
(the result of keys
) first. A Map
is also an Iterable
, and all Iterable
have a max
method, so you can simply do m.max
. That, however, will compute the max of Tuple2[Int, AnyRef]
, a type for which there is no Ordering
. However, max
receives an implicit parameter telling it what Ordering
to use, which can be thus provided:
m.max(Ordering by ((_: (Int, AnyRef))._1))
This is a bit of a mouthful, so we can define the implicit we need and use it, err, implicitly. :-)
def deMap(m: Map[Int, AnyRef]): Array[AnyRef] = {
implicit val myOrdering = Ordering by ((_: (Int, AnyRef))._1)
def translate(x: AnyRef): AnyRef = x match {
case map: Map[_, _] =>
val validMap = map collect { case (key: Int, value: AnyRef) => (key -> value) }
if (map.size > validMap.size) {
val wrongPairs = map.toSeq diff validMap.toSeq
throw new IllegalArgumentException("Invalid key/value pairs found: "+wrongPairs)
}
deMap(validMap)
case s: String => s
case other => throw new IllegalArgumentException("Expected Map or String, got "+other.getClass.toString)
}
(0 to m.max._1 view) map (m getOrElse (_, "")) map translate toArray
}
Note that max
returns a tuple, key
and value
, so we need _1
to get the key
. Also, note that we create an Ordering
object at each nesting of deMap
. Not bad, but it could be made better by defining it elsewhere.
Once you have done all that, a while
loop will still be faster, though I have no idea by how much. Given enough JVM optimizations, they might come pretty close. On the other hand, if you really want speed, you could go all the way to assembler code. It comes down to a balance between how easy and fast it is to write the code, how easy it is to maintain it, and how fast it runs.
精彩评论