开发者

C++ Arbitrary width container

I have a large lookup table that currently needs 12 bits per entry. Is there a standard class that will give me a memory efficient container for storing odd-sized data? I have about a billion items in the table, so I care more about memory efficiency than speed.

I need to be able to get the underlying data and read/write开发者_运维问答 it to a file as well.


How about this:

#include <stdio.h>

typedef unsigned char byte;
typedef unsigned short word;
typedef unsigned int uint;
typedef unsigned long long int qword;

enum {
  bits_per_cell = 12, cellmask = (1<<bits_per_cell)-1,
  N_cells = 1000000,
  bufsize = (N_cells*bits_per_cell+7)/8,
};

byte* buf;

byte* Alloc( void ) {
  buf = new byte[bufsize];
  return buf;
};

// little-endian only
void put( uint i, uint c ) {
  qword x = qword(i)*bits_per_cell;
  uint  y = x&15, z = (x>>4)<<1;
  uint& a = (uint&)buf[z];
  uint mask = ~(cellmask<<y);
  a = a & mask | ((c&cellmask)<<y);
}

uint get( uint i ) {
  qword x = qword(i)*bits_per_cell;
  uint  y = x&15, z = (x>>4)<<1;
  uint& a = (uint&)buf[z];
  return (a>>y)&cellmask;
}

/* 

// bigendian/universal
void put( uint i, uint c ) {
  qword x = qword(i)*bits_per_cell;
  uint y = x&7, z = (x>>3);
  uint a = buf[z] + (buf[z+1]<<8) + (buf[z+2]<<16);
  uint mask = ~(cellmask<<y);
  a = a & mask | ((c&cellmask)<<y);
  buf[z] = byte(a); buf[z+1]=byte(a>>8); buf[z+2]=byte(a>>16);
}

uint get( uint i ) {
  qword x = qword(i)*bits_per_cell;
  uint  y = x&7, z = (x>>3);
  uint a = buf[z] + (buf[z+1]<<8) + (buf[z+2]<<16);
  return (a>>y)&cellmask;
}
*/

int main( void ) {

  if( Alloc()==0 ) return 1;

  uint i;

  for( i=0; i<N_cells; i++ ) put( i^1, i );

  for( i=0; i<N_cells; i++ ) {
    uint j = i^1, c, d; 
    c = get(j); d = i & cellmask;
    if( c!=d ) printf( "x[%08X]=%04X, not %04X\n", j,c,d );
  }

}


Have you looked at Boost::dynamic_bitset? I'm not saying that it would be the be-all, end all of your dreams but it could help you with some of the characteristics you've described. It's very similar to bitset of the standard library, only with resizeable options.

I might not try to use it by itself to solve your problem. Instead, I might combine it with another container class and use it in conjunction with some sort of mapping scheme. I don't know what type of mapping as it will depend on the data and the frequency of cycles. However, thinking more about this:

std::vector<std::bitset<12> > oneBillionDollars; //Austin Powers, my hero!


You have a problem of packing. The only idea I can get is that you'd want to find a LCM of N and some power of two. Not gonna be so easy, but definitely workable.

Also, you cannot really manipulate some weird sized data, so you need to pack it into a bigger integer type. The table will contain the data packed, but the "accessor" will yield an unpacked one.

// General structure
template <size_t N>
class Pack
{
public:
  static size_t const Number = N;
  static size_t const Density = 0; // number of sets of N bits
  typedef char UnpackedType;       // some integral

  UnpackedType Get(size_t i) const;   // i in [0..Density)
  void Set(size_t i, UnpackedType t); // i in [0..Density)

  // arbitrary representation
};

// Example, for 12 bits
// Note: I assume that all is set, you'll have to pad...
// but for a million one or two more should not be too much of an issue I guess
// if it is, the table shall need one more data member, which is reasonnable
class Pack12
{
public:
  typedef uint16_t UnpackedType;

  static size_t const Number = 12;
  static size_t const Density = 4;

  UnpackedType get(size_t i) const;
  void set(size_t i, UnpackedType t);

private:
  uint16_t data[3];
};

Now we can build on that to build a generic table that'll work for any pack:

template <typename Pack>
class Table
{
public:
  typedef typename Pack::UnpackedType UnpackedType;

  bool empty() const;
  size_t size() const;

  UnpackedType get(size_t i) const;
  void set(size_t i, UnpackedType t);

private:
  static size_t const NumberBits = Pack::Number;
  static size_t const Density = Pack::Density;

  std::deque<Pack> data;
};

template <typename Pack>
bool Table<Pack>::empty() const { return data.empty(); }

template <typename Pack>
size_t Table<Pack>::size() const { return data.size() * Density; }

template <typename Pack>
typename Table<Pack>::UnpackedType Table<Pack>::get(size_t i) const
{
  Pack const& pack = data.at(i / Density);
  return pack.get(i % Density);
}

// Table<Pack>::set is the same

A more clever way would be for Pack<N> to be able to deduce the getters and representations... but it doesn't seem worth the effort because the Pack interface is minimal and Table can present a richer interface without asking more than that.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜