To Use or Not to Use Data.Map
I'm currently working on a Haskell API. The latter provides some functions that currently take a list of lists as input, i.e. [(String,[(String, Double)])]
.
For visualization purposes, here's a sample of the list of lists mentioned above:
[
("A", [
("I1", 1),
("I2", 2),
]
),
("B", [
("I1", 3),
]
)
]
I've defined some private helper functions. One helper function will search for specific entries in this list (Data.List.find
= O(n)
); another one will perform intersections; and another function will transform the list presented above to the following one:
[
("I1", [
("A", 1),
("B", 3),
]
),
("I2", [
("A", 2),
]
)
]
The function that performs the transformation uses Data.Map
, since it offers some functions that simplify that process a lot, like Data.Map.un开发者_如何学编程ionWith
and Data.Map.insertWith
. Well, since the transformation function had to call Data.Map.fromList
and Data.Map.toList
, I thought it would be nice to have a map of maps instead of a list of lists from the beginning. And so I changed my sample input to match the map of maps requirement.
Again, for visualization purposes, here's the list from above as a map of maps:
Map.fromList [
("A", Map.fromList [
("I1", 1),
("I2", 2),
]
),
("B", Map.fromList [
("I1", 3),
]
)
]
Thanks to this step my code lost a few lines, and thanks to Data.Map.lookup
, finding a desired now only takes O(log n)
time.
Nonetheless, I'm currently asking myself if this really is a good solution? Is a map of maps the way to go? Or should the transformation function work with Data.Map.fromList
and Data.Map.toList
, and let the rest work with list of lists? Or better yet, is there a data structure that is more suitable for this kind of work?
I'm really looking forward to your replies.
Initialization of the map-of-maps still only takes O(n)
.
Consider the list-of-lists first.
Let's say the outer list is [ a1, a2, ..., ap ], and each inner item is aj = ( lj, [ b0, b1, ..., bqj ]). Then construction of the list-of-lists takes O(n = ∑j=1p qj).
Initializing an inner map takes mj. = O(qj). Initializing the map-of-maps takes O(∑j=1p mj) = O(n).
This smells like graphs and edges. One slightly different approach, which may or may not work is to rework your problem so instead of [(String,[(String,Double)])]
you simply operate on 2-tuples of strings. Then you have [((String, String), Double)]
and the resulting map is of type Data.Map.Map (String, String) Double
.
Alternatively, if the space of string keys is limited, and can thus be mapped efficiently into machine ints, look into using an IntMap. Same semantics as a map except that the keys MUST be machine ints (Int32 or Int64). Will have much better performance.
Of course this depends on your actual data, but maybe you could use a Multimap instead? There are implementations floating around (e.g. http://hackage.haskell.org/packages/archive/Holumbus-Distribution/0.0.1.1/doc/html/Holumbus-Data-MultiMap.html ) but I didn't try them out.
精彩评论