convert a string to int
I have a large file where each line contains space-separated integers. The task is to sparse this file line-by-line. For the string to int conversion I have three solutions:
static int stringToIntV1(const string& str) {
return (atoi(str.c_str()));
}
However, if I pass a malformed string, it doesn't produce any error. For instance the string "123error" is converted to 123.
Second solution:
static int stringToIntV2(const string& str)
{
int result;
istringstream myStream(str);
if (myStream >> result) {
return result;
}
// else
throw domain_error(str + " is not an int!");
}
I have the same problem here, malformed strings don't raise an error.
Third solution with Boost (found at Boost Library):
static int stringToIntV3(const string& str)
{
int iResult = 0;
try {
iResult = lexical_cast<int>(str);
}
catch(bad_lexical_cast &) {
throw domain_error(str + " is not an int!");
}
return iResult;
}
This one gives correct result.
However, there is a significant difference in the execution time. Testing on a large text file (32 MB), I got the following times:
- (1) with atoi: 4.522s (winner)
- (2) with istringstream: 15.303s (very slow)
- (3) with lexical_cast: 10.958s (between the two)
My question: do you know how to notice malformed strings with atoi? It would give the fastest solution. Or do you know a better 开发者_如何转开发solution?
Update: Thanks for the answers. Following the tips, I came up with this solution:
static int stringToIntV4(const string& str)
{
char * pEnd;
const char * c_str = str.c_str();
int result = strtol(c_str, &pEnd, 10);
if (pEnd == c_str+str.length()) {
return result;
}
// else
throw domain_error("'" + str + "'" + " is not an int!");
}
The good news is that it yields if there is a problem and as efficient as the atoi
version.
I'd use strtol
. It takes a parameter that it sets to point at the first character it couldn't convert, so you can use that to determine whether the entire string was converted.
Edit: as far as speed goes, I'd expect it to be slightly slower than atoi
, but faster than the others you tried.
The strtol
function gives you back a pointer to the next character in the string. You could check that character after conversion to see if it is not whitespace, indicating error. I ran a simple test and the performance of strtol seems to be comparable to atoi.
I know this one has been covered with existing functions. If performance is a paramount concern, it is not much more work to write your own conversion to do exactly what you need. For example, if you know there will not be any leading spaces or all your numbers are positive, don't deal with those cases. The compiler should be able to inline this function too.
#include <ctype.h>
#include <string>
inline int customStringToInt( const std::string &str ) {
register const char *p = str.c_str(), *pEnd = p + str.size();
// Remove leading whitespace (remove if no leading whitespace).
while( p != pEnd && isspace(*p) ) ++p;
// Handle neg/pos sign (remove if no sign).
int sign = 1;
if( p != pEnd ) {
if( *p == '-' ) { ++p; sign = -1; }
else if( *p == '+' ) { ++p; }
}
// String without any digits is not OK (blank != 0) (remove if it is OK).
if( p == pEnd ) throw domain_error("'" + str + "'" + " has no digits.");
// Handle digits.
register int i = 0;
while( p != pEnd )
if( isdigit(*p) ) i = i * 10 + (*p++ - '0');
else throw domain_error("'" + str + "'" + " contains non-digits.");
return sign * i;
}
Can you simply check the string before calling atoi?
isdigit()
is a wonderful little function, and you should have no problem writing isalldigits()
to check a string in a tight, fast loop.
(if you happen to work with decimals, or +/- signs, you might want to add that in too)
So, simply verify that the string is all digits, then call atoi. Should be pretty fast.
I did my own small test. It seem to take about the same amount of time to use atoi() as with stream operators. I have a file with 2,000,000 numbers. 10 numbers on each line (though the code does not use this fact).
The first version uses atoi() which I admit took me a while to get correct and could be more effecient. Updates accepted and to get it more effecient.
The stream one. Took 20 seconds to write and worked out of the box.
Timing Results Are:
> time ./a.exe 1
AtoI()
6.084u 0.156s 0:06.33 98.4% 0+0k 0+0io 8619pf+0w
> time ./a.exe
Iterator()
4.680u 0.296s 0:05.05 98.4% 0+0k 0+0io 6304pf+0w
Code:
#include <vector>
#include <iostream>
#include <iterator>
#include <fstream>
#include <iostream>
int main(int argc,char* argv[])
{
std::vector<int> data;
std::ifstream vals("rand.txt");
if (argc > 1)
{
std::cout << "AtoI()\n";
std::string line;
while(std::getline(vals,line))
{
std::string::size_type loop = 0;
while(loop < line.size())
{
while(isspace(line[loop]) && (loop < line.size()))
{ ++loop;
}
std::string::size_type end = line.find_first_not_of("0123456789",loop);
data.push_back(atoi(line.substr(loop,end - loop).c_str()));
loop = end;
}
}
}
else
{
std::cout << "Iterator()\n";
std::copy( std::istream_iterator<int>(vals),
std::istream_iterator<int>(),
std::back_inserter(data)
);
}
}
Any leading zeroes in the numbers? if atoi returns zero and the first digit character of the number is not '0', then we have an error.
Why not convert it while you have a valid file pointer rather than loading it into a string then parsing the integer? Basically you're wasting memory and time.
In my opinion you should do this:
// somewhere in your code
ifstream fin("filename");
if(!fin) {
// handle error
}
int number;
while(getNextNumber(fin, number))
{
cout << number << endl;
}
fin.close();
// done
// some function somewhere
bool getNextNumber(ifstream& fin, int& number)
{
while(!(fin >> number))
{
fin.clear();
fin.ignore(numeric_limits<streamsize>::max(),'\n');
}
if(!fin || fin.eof())
return false;
return true;
}
精彩评论