C++ Arbitrary width container
I have a large lookup table that currently needs 12 bits per entry. Is there a standard class that will give me a memory efficient container for storing odd-sized data? I have about a billion items in the table, so I care more about memory efficiency than speed.
I need to be able to get the underlying data and read/write开发者_运维问答 it to a file as well.
How about this:
#include <stdio.h>
typedef unsigned char byte;
typedef unsigned short word;
typedef unsigned int uint;
typedef unsigned long long int qword;
enum {
bits_per_cell = 12, cellmask = (1<<bits_per_cell)-1,
N_cells = 1000000,
bufsize = (N_cells*bits_per_cell+7)/8,
};
byte* buf;
byte* Alloc( void ) {
buf = new byte[bufsize];
return buf;
};
// little-endian only
void put( uint i, uint c ) {
qword x = qword(i)*bits_per_cell;
uint y = x&15, z = (x>>4)<<1;
uint& a = (uint&)buf[z];
uint mask = ~(cellmask<<y);
a = a & mask | ((c&cellmask)<<y);
}
uint get( uint i ) {
qword x = qword(i)*bits_per_cell;
uint y = x&15, z = (x>>4)<<1;
uint& a = (uint&)buf[z];
return (a>>y)&cellmask;
}
/*
// bigendian/universal
void put( uint i, uint c ) {
qword x = qword(i)*bits_per_cell;
uint y = x&7, z = (x>>3);
uint a = buf[z] + (buf[z+1]<<8) + (buf[z+2]<<16);
uint mask = ~(cellmask<<y);
a = a & mask | ((c&cellmask)<<y);
buf[z] = byte(a); buf[z+1]=byte(a>>8); buf[z+2]=byte(a>>16);
}
uint get( uint i ) {
qword x = qword(i)*bits_per_cell;
uint y = x&7, z = (x>>3);
uint a = buf[z] + (buf[z+1]<<8) + (buf[z+2]<<16);
return (a>>y)&cellmask;
}
*/
int main( void ) {
if( Alloc()==0 ) return 1;
uint i;
for( i=0; i<N_cells; i++ ) put( i^1, i );
for( i=0; i<N_cells; i++ ) {
uint j = i^1, c, d;
c = get(j); d = i & cellmask;
if( c!=d ) printf( "x[%08X]=%04X, not %04X\n", j,c,d );
}
}
Have you looked at Boost::dynamic_bitset? I'm not saying that it would be the be-all, end all of your dreams but it could help you with some of the characteristics you've described. It's very similar to bitset of the standard library, only with resizeable options.
I might not try to use it by itself to solve your problem. Instead, I might combine it with another container class and use it in conjunction with some sort of mapping scheme. I don't know what type of mapping as it will depend on the data and the frequency of cycles. However, thinking more about this:
std::vector<std::bitset<12> > oneBillionDollars; //Austin Powers, my hero!
You have a problem of packing. The only idea I can get is that you'd want to find a LCM of N
and some power of two. Not gonna be so easy, but definitely workable.
Also, you cannot really manipulate some weird sized data, so you need to pack it into a bigger integer type. The table will contain the data packed, but the "accessor" will yield an unpacked one.
// General structure
template <size_t N>
class Pack
{
public:
static size_t const Number = N;
static size_t const Density = 0; // number of sets of N bits
typedef char UnpackedType; // some integral
UnpackedType Get(size_t i) const; // i in [0..Density)
void Set(size_t i, UnpackedType t); // i in [0..Density)
// arbitrary representation
};
// Example, for 12 bits
// Note: I assume that all is set, you'll have to pad...
// but for a million one or two more should not be too much of an issue I guess
// if it is, the table shall need one more data member, which is reasonnable
class Pack12
{
public:
typedef uint16_t UnpackedType;
static size_t const Number = 12;
static size_t const Density = 4;
UnpackedType get(size_t i) const;
void set(size_t i, UnpackedType t);
private:
uint16_t data[3];
};
Now we can build on that to build a generic table that'll work for any pack:
template <typename Pack>
class Table
{
public:
typedef typename Pack::UnpackedType UnpackedType;
bool empty() const;
size_t size() const;
UnpackedType get(size_t i) const;
void set(size_t i, UnpackedType t);
private:
static size_t const NumberBits = Pack::Number;
static size_t const Density = Pack::Density;
std::deque<Pack> data;
};
template <typename Pack>
bool Table<Pack>::empty() const { return data.empty(); }
template <typename Pack>
size_t Table<Pack>::size() const { return data.size() * Density; }
template <typename Pack>
typename Table<Pack>::UnpackedType Table<Pack>::get(size_t i) const
{
Pack const& pack = data.at(i / Density);
return pack.get(i % Density);
}
// Table<Pack>::set is the same
A more clever way would be for Pack<N>
to be able to deduce the getters and representations... but it doesn't seem worth the effort because the Pack
interface is minimal and Table
can present a richer interface without asking more than that.
精彩评论