Parsing a line with a variable number of entries in C or C++ (no boost)
I have a file containing lines of the form,
double mass, string seq, int K, int TS, int M, [variable numbe开发者_C百科r of ints]
688.83 AFTDSK 1 1 0 3384 2399 1200
790.00 MDSSTK 1 3 1 342 2
I need a (preferably simple) way of parsing this file without boost. If the number of values per line had been constant then I would have used the solution here.
Each line will become an object of class Peptide:
class Peptide {
public:
double mass;
string sequence;
int numK;
int numPTS;
int numM;
set<int> parents;
}
The first three integers have specific variable names in the object while all the following integers need to be inserted into a set.
I was fortunate enough to get two really awesome responses but the run time differences made the C implementation the best answer for me.
If you want to use C++, use C++:
std::list<Peptide> list;
std::ifstream file("filename.ext");
while (std::getline(file, line)) {
// Ignore empty lines.
if (line.empty()) continue;
// Stringstreams are your friends!
std::istringstream row(line);
// Read ordinary data members.
Peptide peptide;
row >> peptide.mass
>> peptide.sequence
>> peptide.numK
>> peptide.numPTS
>> peptide.numM;
// Read numbers until reading fails.
int parent;
while (row >> parent)
peptide.parents.insert(parent);
// Do whatever you like with each peptide.
list.push_back(peptide);
}
The best way I know of to parse an ascii text file is to read it line-by-line and use strtok. It's a C function, but it'll break your input into individual tokens for you. Then, you can use the string parsing functions atoi and strtod to parse your numeric values. For the file format you specified, I'd do something like this:
string line;
ifstream f(argv[1]);
if(!f.is_open()) {
cout << "The file you specified could not be read." << endl;
return 1;
}
while(!f.eof()) {
getline(f, line);
if(line == "" || line[0] == '#') continue;
char *ptr, *buf;
buf = new char[line.size() + 1];
strcpy(buf, line.c_str());
Peptide pep;
pep.mass = strtod(strtok(buf, " "), NULL);
pep.sequence = strtok(NULL, " ");
pep.numK = strtol(strtok(NULL, " "), NULL, 10);
pep.numPTS = strtol(strtok(NULL, " "), NULL, 10);
pep.numM = strtol(strtok(NULL, " "), NULL, 10);
while(ptr = strtok(NULL, " "))
pep.parents.insert(strtol(ptr, NULL, 10));
cout << "mass: " << mass << endl
<< "sequence: " << sequence << endl
<< "numK: " << numK << endl
<< "numPTS: " << numPTS << endl
<< "numM: " << numM << endl
<< "parents:" << endl;
set<int>::iterator it;
for(it = parents.begin(); it != parents.end(); it++)
cout << "\t- " << *it << endl;
}
f.close();
精彩评论