Automatic Clustering With Hash/Map of Vectors in C++
I have the following values for example:
0 0 0 1 3 2
These values refers to cluster Ids, where the member of the cluster is the index of the vector. Hence we want to get this sort of output:
Cluster 0 -> 0,1,2
Cluster 1 -> 3
Cluster 2 -> 5
Cluster 3 -> 4
I tried the following construct but it doesn't seem to work: What's the way to do it?
#include <iostream>
#include <vector>
#include <fstream>
#include <sstream>
#include <map>
using namespace std;
int main ( int arg_count, char *arg_vec[] ) {
if (arg_count !=2 ) {
cerr << "expected one argument" << endl;
return EXIT_FAILURE;
}
string line;
ifstream myfile (arg_vec[1]);
map <int, vector <int> > CCTagMap;
if (myfile.is_ope开发者_如何学Gon())
{
// Skip First Line
getline(myfile,line);
while (getline(myfile,line) )
{
stringstream ss(line);
int CcId;
int TagId = -1;
vector <int> Temp;
while (ss >> CcId) {
TagId++;
cout << CcId << "-" << TagId << endl;
# this way to cluster doesn't seem to work
CCTagMap.insert(make_pair(CcId,Temp.push_back(TagId)));
}
}
myfile.close();
}
else { cout << "Unable to open file\n";}
return 0;
}
You are overwriting the vector each time during insert. What you can do is something like:
map <int, vector <int> >::iterator iter = CCTagMap.find(CcId);
if(iter == CCTagMap.end())
{
vector <int> Temp;
temp.push_back(TagId);
CCTagMap[CcId] = temp;
}
else
{
iter->second.push_back(TagId);
}
I supposed I would post a solution going for performance (remember, you should always try to optimize a soon as possible according to Knuth! unless I am mistaken ;) ).
CCTagMap[CcId].push_back(TagId);
Yeah, I really thought I should indicate this solution...
- save some typing
- more performant: only one lookup, instead of one for find and one for insertion. No temporary (though they would probably be optimized anyway).
- idiomatic ?
For a decomposed view of what's going on here:
std::vector<int>& CcVector = CcTagMap[CcId]; // [1]
CcVector.push_back(TagId); // [2]
map::operator[]
is a mutating operator: it returns either the element stored for this key or inserts a newly default constructed element if none existed. It then returns a reference to the element (whether new or not).- We use the reference for a direct
push_back
, nice and clean.
What you are doing wrong is rewriting the vectors in the map each time.
Instead of:
CCTagMap.insert(make_pair(CcId,Temp.push_back(TagId)));
Try:
if ( CCTagMap.find( CcId ) == CCTagMap.end() )
{
CCTagMap.insert(make_pair(CcId,vector<int>()));
}
CCTagMap[CcId].push_back( TagId );
Or even better,
map <int, vector<int> >::iterator iter = CCTagMap.find(CcId);
if ( iter == CCTagMap.end() )
{
CCTagMap.insert(make_pair(CcId,vector<int>())).first->second.push_back( TagId );
}
else
{
iter->second.push_back( TagId );
}
How about this? Instead of your CCTagMap
, define:
std::vector<std::vector<int> > clusters;
then for each TagId
and CcId
do:
if (clusters.size() <= CcId)
clusters.resize(CcId + 1);
clusters[CcId].append(TagId);
精彩评论