开发者

To Use or Not to Use Data.Map

I'm currently working on a Haskell API. The latter provides some functions that currently take a list of lists as input, i.e. [(String,[(String, Double)])].

For visualization purposes, here's a sample of the list of lists mentioned above:

[
    ("A",   [
                ("I1", 1),
                ("I2", 2),
            ]
    ),
    ("B",   [
                ("I1", 3),
            ]
    )
]

I've defined some private helper functions. One helper function will search for specific entries in this list (Data.List.find = O(n)); another one will perform intersections; and another function will transform the list presented above to the following one:

[
    ("I1",  [
                ("A", 1),
                ("B", 3),
            ]
    ),
    ("I2",  [
                ("A", 2),
            ]
    )
]

The function that performs the transformation uses Data.Map, since it offers some functions that simplify that process a lot, like Data.Map.un开发者_如何学编程ionWith and Data.Map.insertWith. Well, since the transformation function had to call Data.Map.fromList and Data.Map.toList, I thought it would be nice to have a map of maps instead of a list of lists from the beginning. And so I changed my sample input to match the map of maps requirement.

Again, for visualization purposes, here's the list from above as a map of maps:

Map.fromList [
    ("A",   Map.fromList [
                ("I1", 1),
                ("I2", 2),
            ]
    ),
    ("B",   Map.fromList [
                ("I1", 3),
            ]
    )
]

Thanks to this step my code lost a few lines, and thanks to Data.Map.lookup, finding a desired now only takes O(log n) time.

Nonetheless, I'm currently asking myself if this really is a good solution? Is a map of maps the way to go? Or should the transformation function work with Data.Map.fromList and Data.Map.toList, and let the rest work with list of lists? Or better yet, is there a data structure that is more suitable for this kind of work?

I'm really looking forward to your replies.


Initialization of the map-of-maps still only takes O(n).

Consider the list-of-lists first.

Let's say the outer list is [ a1, a2, ..., ap ], and each inner item is aj = ( lj, [ b0, b1, ..., bqj ]). Then construction of the list-of-lists takes O(n = ∑j=1p qj).

Initializing an inner map takes mj. = O(qj). Initializing the map-of-maps takes O(∑j=1p mj) = O(n).


This smells like graphs and edges. One slightly different approach, which may or may not work is to rework your problem so instead of [(String,[(String,Double)])] you simply operate on 2-tuples of strings. Then you have [((String, String), Double)] and the resulting map is of type Data.Map.Map (String, String) Double.

Alternatively, if the space of string keys is limited, and can thus be mapped efficiently into machine ints, look into using an IntMap. Same semantics as a map except that the keys MUST be machine ints (Int32 or Int64). Will have much better performance.


Of course this depends on your actual data, but maybe you could use a Multimap instead? There are implementations floating around (e.g. http://hackage.haskell.org/packages/archive/Holumbus-Distribution/0.0.1.1/doc/html/Holumbus-Data-MultiMap.html ) but I didn't try them out.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜