Any smarter way to extract from array of bits?
I have areas of memory that could be considered "array of bits". They are equivalent to
unsigned char arr[256];
But it would be be开发者_如何转开发tter thought of as
bit arr[2048];
I'm accessing separate bits from it with
#define GETBIT(x,in) ((in)[ ((x)/8) ] & 1<<(7-((x)%8)))
but I do it a lot in many places of the code, often in performance-critical sections and I wonder if there are any smarter, more optimal methods to do it.
extra info: Architecture: ARM9 (32 bit); gcc/Linux. The physical data representation can't be changed - it is externally provided or mapped for external use.
I don't think so. In fact, many CPU architectures won't access bits individually.
On C++ you have std::bitset<N>
. but may not have highest-performance depending on your compiler's implementation and optimization.
BTW, it may be better to group your bit-array as uint32_t[32]
(or uint64_t[16]
) for aligned dereferencing (which bitset
does this for you already).
For randomly accessing individual bits, the macro you've suggested is as good as you're going to get (as long as you turn on optimisations in your compiler).
If there is any pattern at all to the bits you're accessing, then you may be able to do better. For example, if you often access pairs of bits, then you may see some improvement by providing a method to get two bits instead of one, even if you don't always end up using both bits.
As with any optimisation problem, you will need to be very familiar with the behaviour of your code, in particular its access patterns in your bit array, to make a meaningful improvement in performance.
Update: Since you access ranges of bits, you can probably squeeze some more performance out of your macros. For example, if you need to access four bits you might have macros like this:
#define GETBITS_0_4(x,in) (((in)[(x)/8] & 0x0f))
#define GETBITS_1_4(x,in) (((in)[(x)/8] & 0x1e) >> 1)
#define GETBITS_2_4(x,in) (((in)[(x)/8] & 0x3c) >> 2)
#define GETBITS_3_4(x,in) (((in)[(x)/8] & 0x78) >> 3)
#define GETBITS_4_4(x,in) (((in)[(x)/8] & 0xf0) >> 4)
#define GETBITS_5_4(x,in) ((((in)[(x)/8] & 0xe0) >> 5) | (((in)[(x)/8+1] & 0x01)) << 3)
#define GETBITS_6_4(x,in) ((((in)[(x)/8] & 0xc0) >> 6) | (((in)[(x)/8+1] & 0x03)) << 2)
#define GETBITS_7_4(x,in) ((((in)[(x)/8] & 0x80) >> 7) | (((in)[(x)/8+1] & 0x07)) << 1)
// ...etc
These macros would clip out four bits from each bit position 0, 1, 2, etc. (To cut down on the proliferation of pointless parentheses, you might want to use inline functions for the above.) Then perhaps define an inline function like:
inline int GETBITS_4(int x, unsigned char *in) {
switch (x % 8) {
case 0: return GETBITS_0_4(x,in);
case 1: return GETBITS_1_4(x,in);
case 2: return GETBITS_2_4(x,in);
// ...etc
}
}
Since this is a lot of tedious boilerplate code, especially if you've got multiple different widths, you may want to write a program to generate all the GETBIT_*
accessor functions.
(I notice that the bits in your bytes are stored in the reverse order from what I've written above. Apply an appropriate transformation to match your structure if you need to.)
Taking Greg's solution as a basis:
template<unsigned int n, unsigned int m>
inline unsigned long getbits(unsigned long[] bits) {
const unsigned bitsPerLong = sizeof(unsigned long) * CHAR_BIT
const unsigned int bitsToGet = m - n;
BOOST_STATIC_ASSERT(bitsToGet < bitsPerLong);
const unsigned mask = (1UL << bitsToGet) - 1;
const size_t index0 = n / bitsPerLong;
const size_t index1 = m / bitsPerLong;
// Do the bits to extract straddle a boundary?
if (index0 == index1) {
return (bits[index0] >> (n % bitsPerLong)) & mask;
} else {
return ((bits[index0] >> (n % bitsPerLong)) + (bits[index1] << (bitsPerLong - (m % bitsPerLong)))) & mask;
}
}
Can get at least 32 bits, even if they are not aligned. Note that's intentionally inline
as you don't want to have tons of these functions.
If You reverse the bit order in 'arr', then You can eliminate the substraction from the macro. It is the best what i can say, without knowledge of the problem context (how the bits are used).
#define GETBIT(x,in) ((in)[ ((x)/8) ] & 1<<(7-((x)%8)))
can be optimized.
1) Use standard int which is normally the fastest accessible integer datatype. If you don't need to be portable, you can find out the size of an int with sizeof and adapt the following code.
2)
#define GETBIT(x,in) ((in)[ ((x) >>> 3) ] & 1<<((x) & 7))
The mod operator % is slower than ANDing. And you don't need to subtract, simply adjust your SETBIT routine.
Why not create your own wrapper class?
You could then add bits to the "array" using an operator such as + and get back the individual bits using the [] operator.
Your macro could be improved by using & 7 instead of % 8 but its likely the compiler will make that optimisation for you anyway.
I recently did exactly what you are doing and my stream could consist of any number of bits.
So I have something like the following:
BitStream< 1 > oneBitBitStream;
BitStream< 2 > twoBitBitStream;
oneBitBitStream += Bit_One;
oneBitBitStream += Bit_Zero;
twoBitBitStream += Bit_Three;
twoBitBitStream += Bit_One;
and so on. It makes for nice readable code and you can provide an STL like interface to it for aiding faimilarity :)
Since the question is tagged with C++, is there any reason you can't simply use the standard bitset?
Instead of the unsigned char array and custom macros, you can use std::vector<bool>
. The vector class template has a special template specialization for the bool type. This specialization is provided to optimize for space allocation: In this template specialization, each element occupies only one bit (which is eight times less than the smallest type in C++: char).
精彩评论