any one know how to convert a huge char array to float, very huge array, performance better than the atof/strtod/sscanf

2022-12-15 18:42 问答作者：

I got a char array, a huge array char p[n] read from a txt like.

//1.txt
194.919 -241.808 234.896
195.569 -246.179 234.482
194.919 -241.808 234.896
...

foo(char *p, float x, float y, float z) {

}

I tried to use atof, strtod, but 开发者_C百科they are real time consuming when the array is too huge, because they will call the strlen(). and the sscanf is also very slow....

I debug into the code and find that both atof() and strtod call the strlen() in the visual studio, we can check the crt code.

strtod() call:
        answer = _fltin2( &answerstruct, ptr, (int)strlen(ptr), 0, 0, _loc_update.GetLocaleT());


atof() call:
        return( *(double *)&(_fltin2( &fltstruct, nptr, (int)strlen(nptr), 0, 0, _loc_update.GetLocaleT())->dval) );

I also try to use strtok, but we should not change any data in the 1.txt.

so any one have the best way to convert all these to float x, y, z.

Visual studio 2008 + WIN7

If you can make additional assumptions about the format of the floating point values, parsing them yourself might increase performance.

Example code for parsing ' ' or '\n'-separated values without exponents and no input validation:

float parsef(const char **str)
{
    const char *cc = *str;

    _Bool neg = (*cc == '-');
    if(neg) ++cc;

    float value = 0, e = 1;

    for(; *cc != '.'; ++cc)
    {
        if(*cc == ' ' || *cc == '\n' || !*cc)
        {
            *str = cc;
            return neg ? -value : value;
        }

        value *= 10;
        value += *cc - '0';
    }

    for(++cc;; ++cc)
    {
        if(*cc == ' ' || *cc == '\n' || !*cc)
        {
            *str = cc;
            return neg ? -value : value;
        }

        e /= 10;
        value += (*cc - '0') * e;
    }
}

Example code:

const char *str = "42 -15.4\n23.001";
do printf("%f\n", parsef(&str));
while(*str++);

Okay, how about doing the tokenization yourself and then calling strtod.

What I'm thinking is something like this:

char *current = ...;  // initialited to the head of your character array
while (*current != '\0')
{
    char buffer[64];
    unsigned int idx = 0;

    // copy over current number
    while (*current != '\0' && !isspace(*current))
    {
        buffer[idx++] = *current++;
    }
    buffer[idx] = '\0';

    // move forward to next number
    while (*current != '\0' && isspace(*current))
    {
        current++;
    }

    // use strtod to convert buffer   
}

Some issues with this is the tokenization is very simple. It will work for the format you posted, but if the format varies (another line uses : to separate the numbers), it won't work.

Another issue is that the code assumes all numbers have < 64 characters. If they are longer, you'll get a buffer overflow.

Also, the copying to a temporary buffer will add some overhead (but hopefully less then the overhead of constantly doing a strlen on the entire buffer). I know you said you can't change the original buffer, but can you do a temporary change (i.e. the buffer can change as as long as you return it to it's original state before you return):

char *current = ...;  // initialited to the head of your character array
while (*current != '\0')
{
    char *next_sep = current;
    while (*next_sep != '\0' && !isspace(*next_sep))
    {
        next_sep++;
    }

    // save the separator before overwriting it
    char tmp = *next_sep;
    *next_sep = '\0';

    // use strtod on current

   // Restore the separator.
   *next_sep = tmp;

    current = next_sep;

    // move forward to next number
    while (*current != '\0' && isspace(*current))
    {
        current++;
    }
}

This technique means no copying and no worries about buffer overflow. You do need to temporarily modify the buffer; hopefully that is

Check out this code.

It can be further optimized if there's no need to support scientific representation, '+' sign, or leading tabs.

It doesn't use strlen, or any other standard library string routine.

// convert floating-point value in string represention to it's numerical value
// return false if NaN
// F is float/double
// T is char or wchar_t
// '1234.567' -> 1234.567
template <class F, class T> inline bool StrToDouble(const T* pczSrc, F& f)
{
    f= 0;

    if (!pczSrc)
        return false;

    while ((32 == *pczSrc) || (9 == *pczSrc))
        pczSrc++;

    bool bNegative= (_T('-') == *pczSrc);

    if ( (_T('-') == *pczSrc) || (_T('+') == *pczSrc) )
        pczSrc++;

    if ( (*pczSrc < _T('0')) || (*pczSrc > _T('9')) )
        return false;

    // todo: return false if number of digits is too large

    while ( (*pczSrc >= _T('0')) && (*pczSrc<=_T('9')) )
    {
        f= f*10. + (*pczSrc-_T('0'));
        pczSrc++;
    }

    if (_T('.') == *pczSrc)
    {
        pczSrc++;

        double e= 0.;
        double g= 1.;

        while ( (*pczSrc >= _T('0')) && (*pczSrc<=_T('9')) )
        {
            e= e*10. + (*pczSrc-_T('0'));
            g= g*10.                    ;
            pczSrc++;
        }

        f+= e/g;
    }

    if ( (_T('e') == *pczSrc) || (_T('E') == *pczSrc) ) // exponent, such in 7.32e-2
    {
        pczSrc++;

        bool bNegativeExp= (_T('-') == *pczSrc);

        if ( (_T('-') == *pczSrc) || (_T('+') == *pczSrc) )
            pczSrc++;

        int nExp= 0;
        while ( (*pczSrc >= _T('0')) && (*pczSrc <= _T('9')) )
        {
            nExp= nExp*10 + (*pczSrc-_T('0'));
            pczSrc++;
        }

        if (bNegativeExp)
            nExp= -nExp;

        // todo: return false if exponent / number of digits of exponent is too large

        f*= pow(10., nExp);
    }

    if (bNegative)
        f= -f;

    return true;
}

As long as you are not using a particularly bad standard library (impossible these times, they are all good) it's not possible to do it faster than atof.

I don't see any reason why strod() should call strlen(). Of course it might, but nothing in its specification requires it and I'd be suprised if it did. And I'd say that strtod() about as fast as you'll get, short of writing some FPU processor-specific stuff yourself.

Why do you think atof, strtod use strlen? I've never implemented them, but I can't imagine why they'd need to know the length of the input string. It would be of no value to them. I'd use strtod as per Jason's answer. That's what it's for.

And yes, if you have a very large amount of text, it's going to take some time to convert. That's just the way it is.

Use strtod. It almost certainly does not call strlen. Why would it need to know the length of the input? It merely runs past leading whitespace, then consumes as many characters as possible that make sense for a floating point literal, and then returns a pointer just past that. You can see an example implementation Perhaps you're using it non-optimally? Here's a sample of how to use strtod:

#include <stdio.h>
#include <stdlib.h>
int main() {
    char *p = "1.txt 194.919 -241.808 234.896 195.569 -246.179 234.482 194.919 -241.808 234.896";
    char *end = p;
    char *q;
    double d;
    while(*end++ != ' '); // move past "1.txt"
    do {
        q = end; 
        d = strtod(q, &end);
        printf("%g\n", d);
    } while(*end != '\0');
}

This outputs:

on my machine.

As others have said, I don't think you're going to do much better than the standard library calls. They have been around for a long time and are quite highly optimized (well, they should be, at least in good implementations).

That said, there are some things that aren't clear to me. Are you reading the whole file into memory and then converting the array to another array? If so, you might want to check that the system you are running on has enough memory to do that with swapping. If you are doing this, would it be possible to just convert one line at a time as you read them off disk instead of storing them?

You could consider multithreading your program. One thread to read and buffer lines off disk, and n threads to process the lines. Dr. Dobb's Journal published a great single-reader/single-writer lockless queue implementation you could use. I've used this in a similar app. My worker threads each have an input queue, and then reader thread reads data off disk and places them into these queues in round robin style.

How about something like:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

static float frac[] =
{
    0.000,
    0.001,
    0.002,
    ...               // fill in
    0.997,
    0.998,
    0.999,
};

static float exp[] =
{
    1e-38,
    1e-37,
    1e-36,
    ...               // fill in
    1e+36,
    1e+37,
    1e+38,
};

float cvt(char* p)
{
    char* d = strchr(p, '.');   // Find the decimal point.
    char* e = strchr(p, 'e');   // Find the exponent.
    if (e == NULL)
        e = strchr(p, 'E');

    float num = atoi(p);
    if (num > 0) {
        num += frac[atoi(d + 1)];
    } else {
        num -= frac[atoi(d + 1)];
    }
    if (e)
        num *= exp[atoi(e)];
    return num;
}

int main()
{
    char line[100];
    while(gets(line)) {
        printf("in %s, out %g\n", line, cvt(line));
    }
}

Should be good to three significant digits.

Edit: watch out for big mantissas.

Edit again: and negative exponents. :-(

I doubt if strlen is costing you much.

If you can take advantage of your numbers falling in a relatively restricted range, then what I suggest is to parse it yourself, doing as little computation as possible, such as:

#define DIGIT(c) ((c)>='0' && (c)<='9')

BOOL parseNum(char* *p0, float *f){
  char* p = *p0;
  int n = 0, frac = 1;
  BOOL bNeg = FALSE;
  while(*p == ' ') p++;
  if (*p == '-'){p++; bNeg = TRUE;}
  if (!(DIGIT(*p) || *p=='.')) return FALSE;
  while(DIGIT(*p)){
    n = n * 10 + (*p++ - '0');
  }
  if (*p == '.'){
    p++;
    while(DIGIT(*p)){
      n = n * 10 + (*p++ - '0');
      frac *= 10;
    }
  }
  *f = (float)n/(float)frac;
  if (bNeg) *f = -*f;
  *p0 = p;
  return TRUE;
}

继续阅读：atof c performance

any one know how to convert a huge char array to float, very huge array, performance better than the atof/strtod/sscanf

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？