any one know how to convert a huge char array to float, very huge array, performance better than the atof/strtod/sscanf
I got a char array, a huge array char p[n] read from a txt like.
//1.txt 194.919 -241.808 234.896 195.569 -246.179 234.482 194.919 -241.808 234.896 ...
foo(char *p, float x, float y, float z) {
}
I tried to use atof, strtod, but 开发者_C百科they are real time consuming when the array is too huge, because they will call the strlen(). and the sscanf is also very slow....
I debug into the code and find that both atof() and strtod call the strlen() in the visual studio, we can check the crt code.
strtod() call:
answer = _fltin2( &answerstruct, ptr, (int)strlen(ptr), 0, 0, _loc_update.GetLocaleT());
atof() call:
return( *(double *)&(_fltin2( &fltstruct, nptr, (int)strlen(nptr), 0, 0, _loc_update.GetLocaleT())->dval) );
I also try to use strtok, but we should not change any data in the 1.txt.
so any one have the best way to convert all these to float x, y, z.
Visual studio 2008 + WIN7
If you can make additional assumptions about the format of the floating point values, parsing them yourself might increase performance.
Example code for parsing ' '
or '\n'
-separated values without exponents and no input validation:
float parsef(const char **str)
{
const char *cc = *str;
_Bool neg = (*cc == '-');
if(neg) ++cc;
float value = 0, e = 1;
for(; *cc != '.'; ++cc)
{
if(*cc == ' ' || *cc == '\n' || !*cc)
{
*str = cc;
return neg ? -value : value;
}
value *= 10;
value += *cc - '0';
}
for(++cc;; ++cc)
{
if(*cc == ' ' || *cc == '\n' || !*cc)
{
*str = cc;
return neg ? -value : value;
}
e /= 10;
value += (*cc - '0') * e;
}
}
Example code:
const char *str = "42 -15.4\n23.001";
do printf("%f\n", parsef(&str));
while(*str++);
Okay, how about doing the tokenization yourself and then calling strtod.
What I'm thinking is something like this:
char *current = ...; // initialited to the head of your character array
while (*current != '\0')
{
char buffer[64];
unsigned int idx = 0;
// copy over current number
while (*current != '\0' && !isspace(*current))
{
buffer[idx++] = *current++;
}
buffer[idx] = '\0';
// move forward to next number
while (*current != '\0' && isspace(*current))
{
current++;
}
// use strtod to convert buffer
}
Some issues with this is the tokenization is very simple. It will work for the format you posted, but if the format varies (another line uses : to separate the numbers), it won't work.
Another issue is that the code assumes all numbers have < 64 characters. If they are longer, you'll get a buffer overflow.
Also, the copying to a temporary buffer will add some overhead (but hopefully less then the overhead of constantly doing a strlen on the entire buffer). I know you said you can't change the original buffer, but can you do a temporary change (i.e. the buffer can change as as long as you return it to it's original state before you return):
char *current = ...; // initialited to the head of your character array
while (*current != '\0')
{
char *next_sep = current;
while (*next_sep != '\0' && !isspace(*next_sep))
{
next_sep++;
}
// save the separator before overwriting it
char tmp = *next_sep;
*next_sep = '\0';
// use strtod on current
// Restore the separator.
*next_sep = tmp;
current = next_sep;
// move forward to next number
while (*current != '\0' && isspace(*current))
{
current++;
}
}
This technique means no copying and no worries about buffer overflow. You do need to temporarily modify the buffer; hopefully that is
Check out this code.
It can be further optimized if there's no need to support scientific representation, '+' sign, or leading tabs.
It doesn't use strlen, or any other standard library string routine.
// convert floating-point value in string represention to it's numerical value
// return false if NaN
// F is float/double
// T is char or wchar_t
// '1234.567' -> 1234.567
template <class F, class T> inline bool StrToDouble(const T* pczSrc, F& f)
{
f= 0;
if (!pczSrc)
return false;
while ((32 == *pczSrc) || (9 == *pczSrc))
pczSrc++;
bool bNegative= (_T('-') == *pczSrc);
if ( (_T('-') == *pczSrc) || (_T('+') == *pczSrc) )
pczSrc++;
if ( (*pczSrc < _T('0')) || (*pczSrc > _T('9')) )
return false;
// todo: return false if number of digits is too large
while ( (*pczSrc >= _T('0')) && (*pczSrc<=_T('9')) )
{
f= f*10. + (*pczSrc-_T('0'));
pczSrc++;
}
if (_T('.') == *pczSrc)
{
pczSrc++;
double e= 0.;
double g= 1.;
while ( (*pczSrc >= _T('0')) && (*pczSrc<=_T('9')) )
{
e= e*10. + (*pczSrc-_T('0'));
g= g*10. ;
pczSrc++;
}
f+= e/g;
}
if ( (_T('e') == *pczSrc) || (_T('E') == *pczSrc) ) // exponent, such in 7.32e-2
{
pczSrc++;
bool bNegativeExp= (_T('-') == *pczSrc);
if ( (_T('-') == *pczSrc) || (_T('+') == *pczSrc) )
pczSrc++;
int nExp= 0;
while ( (*pczSrc >= _T('0')) && (*pczSrc <= _T('9')) )
{
nExp= nExp*10 + (*pczSrc-_T('0'));
pczSrc++;
}
if (bNegativeExp)
nExp= -nExp;
// todo: return false if exponent / number of digits of exponent is too large
f*= pow(10., nExp);
}
if (bNegative)
f= -f;
return true;
}
As long as you are not using a particularly bad standard library (impossible these times, they are all good) it's not possible to do it faster than atof
.
I don't see any reason why strod()
should call strlen()
. Of course it might, but nothing in its specification requires it and I'd be suprised if it did. And I'd say that strtod()
about as fast as you'll get, short of writing some FPU processor-specific stuff yourself.
Why do you think atof, strtod use strlen? I've never implemented them, but I can't imagine why they'd need to know the length of the input string. It would be of no value to them. I'd use strtod as per Jason's answer. That's what it's for.
And yes, if you have a very large amount of text, it's going to take some time to convert. That's just the way it is.
Use strtod
. It almost certainly does not call strlen
. Why would it need to know the length of the input? It merely runs past leading whitespace, then consumes as many characters as possible that make sense for a floating point literal, and then returns a pointer just past that. You can see an example implementation Perhaps you're using it non-optimally? Here's a sample of how to use strtod
:
#include <stdio.h>
#include <stdlib.h>
int main() {
char *p = "1.txt 194.919 -241.808 234.896 195.569 -246.179 234.482 194.919 -241.808 234.896";
char *end = p;
char *q;
double d;
while(*end++ != ' '); // move past "1.txt"
do {
q = end;
d = strtod(q, &end);
printf("%g\n", d);
} while(*end != '\0');
}
This outputs:
194.919
-241.808
234.896
195.569
-246.179
234.482
194.919
-241.808
234.896
on my machine.
As others have said, I don't think you're going to do much better than the standard library calls. They have been around for a long time and are quite highly optimized (well, they should be, at least in good implementations).
That said, there are some things that aren't clear to me. Are you reading the whole file into memory and then converting the array to another array? If so, you might want to check that the system you are running on has enough memory to do that with swapping. If you are doing this, would it be possible to just convert one line at a time as you read them off disk instead of storing them?
You could consider multithreading your program. One thread to read and buffer lines off disk, and n threads to process the lines. Dr. Dobb's Journal published a great single-reader/single-writer lockless queue implementation you could use. I've used this in a similar app. My worker threads each have an input queue, and then reader thread reads data off disk and places them into these queues in round robin style.
How about something like:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
static float frac[] =
{
0.000,
0.001,
0.002,
... // fill in
0.997,
0.998,
0.999,
};
static float exp[] =
{
1e-38,
1e-37,
1e-36,
... // fill in
1e+36,
1e+37,
1e+38,
};
float cvt(char* p)
{
char* d = strchr(p, '.'); // Find the decimal point.
char* e = strchr(p, 'e'); // Find the exponent.
if (e == NULL)
e = strchr(p, 'E');
float num = atoi(p);
if (num > 0) {
num += frac[atoi(d + 1)];
} else {
num -= frac[atoi(d + 1)];
}
if (e)
num *= exp[atoi(e)];
return num;
}
int main()
{
char line[100];
while(gets(line)) {
printf("in %s, out %g\n", line, cvt(line));
}
}
Should be good to three significant digits.
Edit: watch out for big mantissas.
Edit again: and negative exponents. :-(
I doubt if strlen
is costing you much.
If you can take advantage of your numbers falling in a relatively restricted range, then what I suggest is to parse it yourself, doing as little computation as possible, such as:
#define DIGIT(c) ((c)>='0' && (c)<='9')
BOOL parseNum(char* *p0, float *f){
char* p = *p0;
int n = 0, frac = 1;
BOOL bNeg = FALSE;
while(*p == ' ') p++;
if (*p == '-'){p++; bNeg = TRUE;}
if (!(DIGIT(*p) || *p=='.')) return FALSE;
while(DIGIT(*p)){
n = n * 10 + (*p++ - '0');
}
if (*p == '.'){
p++;
while(DIGIT(*p)){
n = n * 10 + (*p++ - '0');
frac *= 10;
}
}
*f = (float)n/(float)frac;
if (bNeg) *f = -*f;
*p0 = p;
return TRUE;
}
精彩评论