Fastest way to convert unsigned char 8 bits to actual numbers
I am using an unsigned char
to store 8 flags. Each flag represents the corner of a cube. So 00000001
will be corner 1 010001开发者_C百科00
will be corners 3 and 7 etc. My current solution is to &
the result with 1,2,4,8,16,32,64 and 128, check whether the result is not zero and store the corner. That is, if (result & 1) corners.push_back(1);
. Any chance I can get rid of that 'if' statement? I was hoping I could get rid of it with bitwise operators but I could not think of any.
A little background on why I want to get rid of the if statement. This cube is actually a Voxel which is part of a grid that is at least 512x512x512 in size. That is more than 134 million Voxels. I am performing calculations on each one of the Voxels (well, not exactly, but I won't go into too much detail as it is irrelevant here) and that is a lot of calculations. And I need to perform these calculations per frame. Any speed boost that is minuscule per function call will help with these amount of calculations. To give you an idea, my algorithm (at some point) needed to determine whether a float was negative, positive or zero (within some error). I had if statements in there and greater/smaller than checks. I replaced that with a fast float to int function and shaved of a quarter of a second. Currently, each frame in a 128x128x128 grid takes a little more than 4 seconds.
I would consider a different approach to it entirely: there are only 256 possibilities for different combinations of flags. Precalculate 256 vectors and index into them as needed.
std::vector<std::vector<int> > corners(256);
for (int i = 0; i < 256; ++i) {
std::vector<int>& v = corners[i];
if (i & 1) v.push_back(1);
if (i & 2) v.push_back(2);
if (i & 4) v.push_back(4);
if (i & 8) v.push_back(8);
if (i & 16) v.push_back(16);
if (i & 32) v.push_back(32);
if (i & 64) v.push_back(64);
if (i & 128) v.push_back(128);
}
for (int i = 0; i < NumVoxels(); ++i) {
unsigned char flags = GetFlags(i);
const std::vector& v = corners[flags];
... // do whatever with v
}
This would avoid all the conditionals and having push_back call new
which I suspect would be more expensive anyway.
If there's some operation that needs to be done if the bit is set and not if it's not, it seems you'll have to have a conditional of some kind somewhere. If it could be expressed as a calculation somehow, you could get around it like this, for example:
numCorners = ((result >> 0) & 1) + ((result >> 1) & 1) + ((result >> 2) & 1) + ...
Hackers's Delight, first page:
x & (-x) // isolates the lowest set bit
x & (x - 1) // clears the lowest set bit
Inlining your push_back
method would also help (better create a function that receives all the flags together).
Usually if you need performance, you should design the whole system with that in mind. Maybe if you post more code it will be easier to help.
EDIT: here is a nice idea:
unsigned char LOG2_LUT[256] = {...};
int t;
switch (count_set_bits(flags)){
case 8: t = flags;
flags &= (flags - 1); // clearing a bit that was set
t ^= flags; // getting the changed bit
corners.push_back(LOG2_LUT[t]);
case 7: t = flags;
flags &= (flags - 1);
t ^= flags;
corners.push_back(LOG2_LUT[t]);
case 6: t = flags;
flags &= (flags - 1);
t ^= flags;
corners.push_back(LOG2_LUT[t]);
// etc...
};
count_set_bits()
is a very known function: http://www-graphics.stanford.edu/~seander/bithacks.html#CountBitsSetTable
There is a way, it's not "pretty", but it works.
(result & 1) && corners.push_back(1);
(result & 2) && corners.push_back(2);
(result & 4) && corners.push_back(3);
(result & 8) && corners.push_back(4);
(result & 16) && corners.push_back(5);
(result & 32) && corners.push_back(6);
(result & 64) && corners.push_back(7);
(result & 128) && corners.push_back(8);
it uses a seldom known feature of the C++ language: the boolean shortcut.
I've noted a similar algorithm in the OpenTTD code. It turned out to be utterly useless: you're faster off by not breaking down numbers like that. Instead, replace the iteration over the vector<>
you have now by an iteration over the bits of the byte. This is far more cache-friendly.
I.e.
unsigned char flags = Foo(); // the value you didn't put in a vector<>
for (unsigned char c = (UCHAR_MAX >> 1) + 1; c !=0 ; c >>= 1)
{
if (flags & c)
Bar(flags&c);
}
精彩评论