开发者

Get top 5 algorithm from a container?

I have a class(object), User. This user has 2 private attributes, "name" and "popularity". I store the objects into a vector (container).

From the container, I need to find the top 5 most popular user, how do I do that? (I have an ugly code, I will post here, if you have a better approach, please let me know. Feel free to use other container, if you think vector is not a good choice, but please use only: map or multimap, list, vector or array, because I only know how to use these.) My current code is:

int top5 = 0, top4 = 0, top3 = 0, top2 = 0, top1 = 0;
vector<User>::iterator it;

for (it = user.begin(); it != user.end(); ++it) 
{
    if( it->getPopularity() > top5){
        if(it->getPopularity() > top4){
            if(it->getPopularity() > top3){
                if(it->g开发者_如何学编程etPopularity() > top2){
                    if(it->getPopularity() > top1){
                        top1 = it->getPopularity();
                        continue;
                    } else {
                        top2 = it->getPopularity();
                        continue;
                    }
                } else {
                    top3 = it->getPopularity();
                    continue;
                }
            }
        } else {
            top4 = it->getPopularity();
            continue;
        }
    } else {
        top5 = it->getPopularity();
        continue;
    }
}

I know the codes is ugly and might be prone to error, thus if you have better codes, please do share with us (us == cpp newbie). Thanks


You can use the std::partial_sort algorithm to sort your vector so that the first five elements are sorted and the rest remains unsorted. Something like this (untested code):

bool compareByPopularity( User a, User b ) {
    return a.GetPopularity() > b.GetPopularity();
}

vector<Users> getMostPopularUsers( const vector<User> &users, int num ) {
    if ( users.size() <= num ) {
        sort( users.begin(), users.end(), compareByPopularity );
    } else {
        partial_sort( users.begin(), users.begin() + num, users.end(), 
                      compareByPopularity );
    }
    return vector<Users>( users.begin(), users.begin() + num );
}


Why don't you sort (std::sort or your own implementation of Quick Sort) the vector based on popularity and take the first 5 values ?

Example:

bool UserCompare(User a, User b) { return a.getPopularity() > b.getPopularity(); }
...
std::sort(user.begin(), user.end(), UserCompare);
// Print first 5 users


If you just want top 5 popular uses, then use std::partial_sort().

    class User
    {
    private:
        string name_m;
        int popularity_m;
    public:
        User(const string& name, int popularity) : name_m(name), popularity_m(popularity) { }
        friend ostream& operator<<(ostream& os, const User& user)
        {
            return os << "name:" << user.name_m << "|popularity:" << user.popularity_m << "\n";
            return os;
        }

        int Popularity() const 
        {
            return popularity_m;
        }

    };

    bool Compare(const User& lhs, const User& rhs)
    {
        return lhs.Popularity() > rhs.Popularity();
    }

    int main()
    {
        // c++0x. ignore if you don't want it.
        auto compare = [](const User& lhs, const User& rhs) -> bool 
                  { return lhs.Popularity() > rhs.Popularity(); };

        partial_sort(users.begin(), users.begin() + 5, users.end(), Compare);

        copy(users.begin(), users.begin() + 5, ostream_iterator<User>(std::cout, "\n"));
    }


First off, cache that it->getPopularity() so you don't have to keep repeating it.

Secondly (and this is much more important): Your algorithm is flawed. When you find a new top1 you have to push the old top1 down to the #2 slot before you save the new top1, but before you do that you have to push the old top2 down to the #3 slot, etc. And that is just for a new top1. You are going to have to do something similar for a new top2, a new top3, etc. The only one you can paste in without worrying about pushing things down the list is when you get a new top5. The correct algorithm is hairy. That said, the correct algorithm is much easier to implement when your topN is an array rather than a bunch of separate values.

Thirdly (and this is even more important than the second point): You shouldn't care about performance, at least not initially. The easy way to do this is to sort the entire list and pluck off the first five off the top. If this suboptimal but simple algorithm doesn't affect your performance, done. Don't bother with the ugly but fast first N algorithm unless performance mandates that you toss the simple solution out the window.

Finally (and this is the most important point of all): That fast first N algorithm is only fast when the number of elements in the list is much, much larger than five. The default sort algorithm is pretty dang fast. It has to be wasting a lot of time sorting the dozens / hundreds of items you don't care about before a pushdown first N algorithm becomes advantageous. In other words, that pushdown insertion sort algorithm may well be a case of premature disoptimization.


Sort your objects, maybe with the library if this is allowed, and then simply selecte the first 5 element. If your container gets too big you could probably use a std::list for the job.

Edit : @itsik you beat me to the sec :)


Do this pseudo code.

Declare top5 as an array of int[5] // or use a min-heap
Initialize top5 as 5 -INF

For each element A
   if A < top5[4]                  // or A < root-of-top5
      Remove top5[4] from top5     // or pop min element from heap
      Insert A to top              // or insert A to the heap


Well, I advise you improve your code by using an array or list or vector to store the top five, like this

struct TopRecord
{
    int index;
    int pop;
} Top5[5];

for(int i = 0; i<5; i++)
{
    Top5[i].index = -1;
    // Set pop to a value low enough
    Top5[i].pop = -1;
}

for(int i = 0; i< users.size(); i++)
{
    int currentpop = i->getPopularity()
    int currentindex = i;
    int j = 0;
    int temp;

    while(j < 5 && Top5[j].pop < currentpop)
    {
        temp = Top5[j].pop;
        Top[j].pop = currentpop;
        currentpop = temp;

        temp = Top5[j].index;
        Top[j].index = currentindex;
        currentindex = temp;

        j++;
    }
}


You also may consider using Randomized Select if Your aim is performance, since originally Randomized Select is good enough for ordered statistics and runs in linear time, You just need to run it 5 times. Or to use partial_sort solution provided above, either way counts, depends on Your aim.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜