Fast way of converting float of range -1 to 1 to short?
I need to repeatedly convert 1024+ consecu开发者_运维百科tive 4 byte floats (range -1 to 1) to 2 byte shorts (range -32768 to 32767) and write to disk.
Currently I do this with a loop:
short v = 0;
for (unsigned int sample = 0; sample < length; sample++)
{
v = (short)(inbuffer[sample * 2] * 32767.0f);
fwrite(&v, 2, 1, file);
}
And this works, but the floating point calc and loop is expensive. Is there any way this could be optimized?
short v = 0;
for (unsigned int sample = 0; sample < length; sample++)
{
v = (short)(inbuffer[sample * 2] * 32767.0f);
// The problem is not here-------^^^^^^^^^^^
fwrite(&v, 2, 1, file);
// it is here ^^^^^^^
}
A typical Mac (objective-c tag, or are we talking about iphone here?) can do billions of float multiplications per second. fwrite however is a library call, which follows some indirections to write its data to some buffer and possibly flush it. It is better to fill your own buffer in a batch:
short v[SZ] = 0;
// make sure SZ is always > length, or allocate a working buffer on the heap.
for (unsigned int sample = 0; sample < length; sample++)
{
v[sample] = (short)(inbuffer[sample * 2] * 32767.0f);
}
fwrite(v,sizeof(v),1,file);
I would have thought the repeated calls to fwrite
would be the expensive part. How about:
short outbuffer[length]; // note: you'll have to malloc this if length isn't constant and you're not using a version of C that supports dynamic arrays.
for (unsigned int sample = 0; sample < length; sample++)
{
outbuffer[sample] = (short)(inbuffer[sample * 2] * 32767.0f);
}
fwrite(outbuffer, sizeof *outbuffer, length, file);
I suppose, that the bottleneck of your loop may be not short to float conversion but writing output to file - try to move file output outside the loop
short v = 0;
short outbuffer = // create outbuffer of required size
for (unsigned int sample = 0; sample < length; sample++)
{
outbuffer[sample] = (short)(inbuffer[sample * 2] * 32767.0f);
}
fwrite(outbuffer, 2, sizeof(outbuffer), file);
You could try something like this:
out[i] = table[((uint32_t *)in)[i]>>16];
where table
is a lookup table that maps the upper 16 bits of an IEEE float to the int16_t
value you want. However that will lose some precision. You'd need to keep and use 23 bits (1 sign bit, 8 exponent bits, and 14 mantissa bits) for full precision, and that means a 16 MB table, which will kill cache coherency and thus performance.
Are you sure that the floating point conversions are slow? As long as you're using fwrite
that way, you're spending a good 50-100 times as much cpu time in fwrite
as on floating point arithmetic. If you deal with this issue and the code is still too slow, you could use an approach of adding a magic bias and reading off the mantissa bits to convert to int16_t
instead of multiplying by 32767.0. That might or might not be faster.
精彩评论