开发者

Automatic Clustering With Hash/Map of Vectors in C++

I have the following values for example:

0 0 0 1 3 2

These values refers to cluster Ids, where the member of the cluster is the index of the vector. Hence we want to get this sort of output:

Cluster 0 -> 0,1,2
Cluster 1 -> 3
Cluster 2 -> 5
Cluster 3 -> 4

I tried the following construct but it doesn't seem to work: What's the way to do it?

#include <iostream>      
#include <vector>        
#include <fstream>
#include <sstream>
#include <map>
using namespace std;

int main  ( int arg_count, char *arg_vec[] ) {
    if (arg_count !=2 ) {
        cerr << "expected one argument" << endl;
        return EXIT_FAILURE;      
    }

    string line;
    ifstream myfile (arg_vec[1]);

    map <int, vector <int> > CCTagMap;

    if (myfile.is_ope开发者_如何学Gon())
    {
        // Skip First Line        
        getline(myfile,line);     

        while (getline(myfile,line) )
        {
            stringstream ss(line);    
            int CcId;
            int TagId = -1;           



            vector <int> Temp;        

            while (ss >> CcId) {      
                TagId++;
                cout << CcId << "-" << TagId <<  endl; 

                # this way to cluster doesn't seem to work
                CCTagMap.insert(make_pair(CcId,Temp.push_back(TagId)));
            }


        }
        myfile.close();
    }
    else  { cout << "Unable to open file\n";} 
    return 0;
}


You are overwriting the vector each time during insert. What you can do is something like:

map <int, vector <int> >::iterator iter = CCTagMap.find(CcId);
if(iter == CCTagMap.end())
{
  vector <int> Temp;  
  temp.push_back(TagId);
  CCTagMap[CcId] = temp;
}
else
{
  iter->second.push_back(TagId);
}


I supposed I would post a solution going for performance (remember, you should always try to optimize a soon as possible according to Knuth! unless I am mistaken ;) ).

CCTagMap[CcId].push_back(TagId);

Yeah, I really thought I should indicate this solution...

  • save some typing
  • more performant: only one lookup, instead of one for find and one for insertion. No temporary (though they would probably be optimized anyway).
  • idiomatic ?

For a decomposed view of what's going on here:

std::vector<int>& CcVector = CcTagMap[CcId]; // [1]
CcVector.push_back(TagId);                   // [2]
  1. map::operator[] is a mutating operator: it returns either the element stored for this key or inserts a newly default constructed element if none existed. It then returns a reference to the element (whether new or not).
  2. We use the reference for a direct push_back, nice and clean.


What you are doing wrong is rewriting the vectors in the map each time.

Instead of:

CCTagMap.insert(make_pair(CcId,Temp.push_back(TagId)));

Try:

if ( CCTagMap.find( CcId ) == CCTagMap.end() )
{
    CCTagMap.insert(make_pair(CcId,vector<int>()));
}
CCTagMap[CcId].push_back( TagId );

Or even better,

map <int, vector<int> >::iterator iter = CCTagMap.find(CcId);
if ( iter == CCTagMap.end() )
{
    CCTagMap.insert(make_pair(CcId,vector<int>())).first->second.push_back( TagId );
}
else
{
    iter->second.push_back( TagId );
}


How about this? Instead of your CCTagMap, define:

std::vector<std::vector<int> > clusters;

then for each TagId and CcId do:

if (clusters.size() <= CcId)
  clusters.resize(CcId + 1);
clusters[CcId].append(TagId);
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜