开发者

Efficient algorithm to remove any map that is contained in another map from a collection of maps

I have set (s) of unique maps (Java HashMaps currently) and wish to remove from it any maps that are completely contained by some other map in the set (i.e. remove m from s if m.entrySet() is a subset of n.entrySet() for some other n in s.)

I have an n^2 algorithm, but it's too slow. Is there a more efficient way to do this?

Edit:

the set of possible keys is small, if that helps.

Here is an inefficient reference implementation:

public void removeSubmaps(Set<Map> s) {
    Set<Map> toRemove = new Ha开发者_C百科shSet<Map>();
    for (Map a: s) {
        for (Map b : s) {
            if (a.entrySet().containsAll(b.entrySet()))
                toRemove.add(b);
        }
    }
    s.removeAll(toRemove);    
}


Not sure I can make this anything other than an n^2 algorithm, but I have a shortcut that might make it faster. Make a list of your maps with the length of the each map and sort it. A proper subset of a map must be shorter or equal to the map you're comparing - there's never any need to compare to a map higher on the list.


Here's another stab at it.

Decompose all your maps into a list of key,value,map number. Sort the list by key and value. Go through the list, and for each group of key/value matches, create a permutation of all the map number pairs - these are all potential subsets. When you have the final list of pairs, sort by map numbers. Go through this second list, and count the number of occurrences of each pair - if the number matches the size of one of the maps, you've found a subset.


Edit: My original interpretation of the problem was incorrect, here is new answer based on my re-read of the question.

You can create a custom hash function for HashMap which returns the product of all hash value of its entries. Sort the list of hash value and start loop from biggest value and find all divisor from smaller hash values, these are possible subsets of this hashmap, use set.containsAll() to confirm before marking them for removal.

This effectively transforms the problem into a mathematical problem of finding possible divisor from a collection. And you can apply all the common divisor-search optimizations.

Complexity is O(n^2), but if many hashmaps are subsets of others, the actual time spent can be a lot better, approaching O(n) in best-case scenario (if all hashmaps are subset of one). But even in worst case scenario, division calculation would be a lot faster than set.containsAll() which itself is O(n^2) where n is number of items in a hashmap.

You might also want to create a simple hash function for hashmap entry objects to return smaller numbers to increase multiply/division performance.


Here's a subquadratic (O(N**2 / log N)) algorithm for finding maximal sets from a set of sets: An Old Sub-Quadratic Algorithm for Finding Extremal Sets.

But if you know your data distribution, you can do much better in average case.


This what I ended up doing. It works well in my situation as there is usually some value that is only shared by a small number of maps. Kudos to Mark Ransom for pushing me in this direction.

In prose: Index the maps by key/value pair, so that each key/value pair is associated with a set of maps. Then, for each map: Find the smallest set associated with one of it's key/value pairs; this set is typically small for my data. Each of the maps in this set is a potential 'supermap'; no other map could be a 'supermap' as it would not contain this key/value pair. Search this set for a supermap. Finally remove all the identified submaps from the original set.

private <K, V>  void removeSubmaps(Set<Map<K, V>> maps) {
    // index the maps by key/value
    List<Map<K, V>> mapList = toList(maps);
    Map<K, Map<V, List<Integer>>> values = LazyMap.create(HashMap.class, ArrayList.class);
    for (int i = 0, uniqueRowsSize = mapList.size(); i < uniqueRowsSize; i++) {
        Map<K, V> row = mapList.get(i);
        Integer idx = i;
        for (Map.Entry<K, V> entry : row.entrySet()) 
            values.get(entry.getKey()).get(entry.getValue()).add(idx);
    }

    // find submaps
    Set<Map<K, V>> toRemove = Sets.newHashSet();
    for (Map<K, V> submap : mapList) {
        // find the smallest set of maps with a matching key/value
        List<Integer> smallestList = null;
        for (Map.Entry<K, V> entry : submap.entrySet()) {
            List<Integer> list = values.get(entry.getKey()).get(entry.getValue());
            if (smallestList  == null || list.size() < smallestList.size())
                smallestList = list;
        }

        // compare with each of the maps in that set
        for (int i : smallestList) {
            Map<K, V> map = mapList.get(i);
            if (isSubmap(submap, map))
                toRemove.add(submap);
        }
    }

    maps.removeAll(toRemove);
}

private <K,V> boolean isSubmap(Map<K, V> submap, Map<K,V> map){
    if (submap.size() >= map.size())
        return false;
    for (Map.Entry<K,V> entry : submap.entrySet()) {
        V other = map.get(entry.getKey());
        if (other == null)
            return false;
        if (!other.equals(entry.getValue()))
            return false;
    }
    return true;
}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜