开发者

how to get median value from sorted map

I am using a std::map. Sometimes I will do an operation like: finding the median value of all items. e.g if I add

1 "s"
2 "sdf"
3 "sdfb"
4 "njw"
5 "loo"

then the median is 3.

Is there some solut开发者_如何转开发ion without iterating over half the items in the map?


I think you can solve the problem by using two std::map. One for smaller half of items (mapL) and second for the other half (mapU). When you have insert operation. It will be either case:

  • add item to mapU and move smallest element to mapL
  • add item to mapL and move greatest element to mapU

In case the maps have different size and you insert element to the one with smaller number of elements you skip the move section. The basic idea is that you keep your maps balanced so the maximum size difference is 1 element. As far as I know STL all operations should work in O(ln(n)) time. Accessing smallest and greatest element in map can be done by using iterator. When you have n_th position query just check map sizes and return greatest element in mapL or smallest element in mapR.

The above usage scenario is for inserting only but you can extend it to deleting items as well but you have to keep track of which map holds item or try to delete from both.

Here is my code with sample usage:

#include <iostream>
#include <string>
#include <map>
using namespace std;

typedef pair<int,string> pis;
typedef map<int,string>::iterator itis;

map<int,string>Left;
map<int,string>Right;

itis get_last(map<int,string> &m){
    return (--m.end());
}

int add_element(int key, string val){
    if (Left.empty()){
        Left.insert(make_pair(key,val));
        return 1;
    }

    pis maxl = *get_last(Left);
    if (key <= maxl.first){
        Left.insert(make_pair(key,val));
        if (Left.size() > Right.size() + 1){
            itis to_rem = get_last(Left);
            pis cpy = *to_rem;
            Left.erase(to_rem);
            Right.insert(cpy);
        }
        return 1;
    } else {
        Right.insert(make_pair(key,val));
        if (Right.size() > Left.size()){
            itis to_rem = Right.begin();
            pis cpy = *to_rem;
            Right.erase(to_rem);
            Left.insert(*to_rem);
        }
        return 2;
    }   
}

pis get_mid(){
    int size = Left.size() + Right.size();
    if (Left.size() >= size / 2){
        return *(get_last(Left));
    }
    return *(Right.begin());
}

int main(){
    Left.clear();
    Right.clear();

    int key;
    string val;
    while (!cin.eof()){
        cin >> key >> val;
        add_element(key,val);
        pis mid = get_mid();
        cout << "mid " << mid.first << " " << mid.second << endl;
    }
}


I think the answer is no. You cannot just jump to the N / 2 item past the beginning because a std::map uses bidirectional iterators. You must iterate through half of the items in the map. If you had access to the underlying Red/Black tree implementation that is typically used for the std::map, you might be able to get close like in Dani's answer. However, you don't have access to that as it is encapsulated as an implementation detail.


Try:

typedef std::map<int,std::string>  Data;
Data           data;
Data::iterator median = std::advance(data.begin(), data.size() / 2); 

Works if the size() is odd. I'll let you work out how to do it when size() is even.


In self balancing binary tree(std::map is one I think) a good approximation would be the root.
For exact value just cache it with a balance indicator, and each time an item added below the median decrease the indicator and increase when item is added above. When indicator is equal to 2/-2 move the median upwards/downwards one step and reset the indicator.


If you can switch data structures, store the items in a std::vector and sort it. That will enable accessing the middle item positionally without iterating. (It can be surprising but a sorted vector often out-performs a map, due to locality. For lookups by the sort key you can use binary search and it will have much the same performance as a map anyway. See Scott Meyer's Effective STL.)


If you know the map will be sorted, then get the element at floor(length / 2). If you're in a bit twiddly mood, try (length >> 1).


I know no way to get the median from a pure STL map quickly for big maps. If your map is small or you need the median rarely you should use the linear advance to n/2 anyway I think - for the sake of simplicity and being standard.

You can use the map to build a new container that offers median: Jethro suggested using two maps, based on this perhaps better would be a single map and a continuously updated median iterator. These methods suffer from the drawback that you have to reimplement every modifiying operation and in jethro's case even the reading operations.

A custom written container will also do what you what, probably most efficiently but for the price of custom code. You could try, as was suggested to modify an existing stl map implementation. You can also look for existing implementations.

There is a super efficient C implementation that offers most map functionality and also random access called Judy Arrays. These work for integer, string and byte array keys.


Since it sounds like insert and find are your two common operations while median is rare, the simplest approach is to use the map and std::advance( m.begin(), m.size()/2 ); as originally suggested by David Rodríguez. This is linear time, but easy to understand so I'd only consider another approach if profiling shows that the median calls are too expensive relative to the work your app is doing.


The nth_element() method is there for you for this :) It implements the partition part of the quick sort and you don't need your vector (or array) to be sorted. And also the time complexity is O(n) (while for sorting you need to pay O(nlogn)).


For a sortet list, here it is in java code, but i assume, its very easy to port to c++:

    if (input.length % 2 != 0) {
        return input[((input.length + 1) / 2 - 1)];
    } else {
        return 0.5d * (input[(input.length / 2 - 1)] + input[(input.length / 2 + 1) - 1]);
    }
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜