开发者

How does one store a vector<bool> or a bitset into a file, but bit-wise?

How to write bitset data to a file?

The first answer doesn't answer the question correctly, since it takes 8 t开发者_StackOverflow中文版imes more space than it should.

How would you do it ? I really need it to save a lot of true/false values.


Simplest approach : take consecutive 8 boolean values, represent them as a single byte, write that byte to your file. That would save lot of space.

In the beginning of file, you can write the number of boolean values you want to write to the file; that number will help while reading the bytes from file, and converting them back into boolean values!


If you want the bitset class that best supports converting to binary, and your bitset is more than the size of unsigned long, then the best option to use is boost::dynamic_bitset. (I presume it is more than 32 and even 64 bits if you are that concerned about saving space).

From dynamic_bitset you can use to_block_range to write the bits into the underlying integral type. You can construct the dynamic_bitset back from the blocks by using from_block_range or its constructor from BlockInputIterator or by making append() calls.

Now you have the bytes in their native format (Block) you still have the issue of writing it to a stream and reading it back.

You will need to store a bit of "header" information first: the number of blocks you have and potentially the endianness. Or you might use a macro to convert to a standard endianness (eg ntohl but you will ideally use a macro that is no-op for your most common platform so if that is little-endian you probably want to store that way and convert only for big-endian systems).

(Note: I am assuming that boost::dynamic_bitset standardly converts integral types the same way regardless of underlying endianness. Their documentation does not say).

To write numbers binary to a stream use os.write( &data[0], sizeof(Block) * nBlocks ) and to read use is.read( &data[0], sizeof(Block) * nBlocks ) where data is assumed to be vector<Block> and before read you must do data.resize(nBlocks) (not reserve()). (You can also do weird stuff with istream_iterator or istreambuf_iterator but resize() is probably better).


Here is a try with two functions that will use a minimal number of bytes, without compressing the bitset.

template<int I>
void bitset_dump(const std::bitset<I> &in, std::ostream &out)
{
    // export a bitset consisting of I bits to an output stream.
    // Eight bits are stored to a single stream byte.
    unsigned int i = 0;  // the current bit index
    unsigned char c = 0; // the current byte
    short bits = 0;      // to process next byte
    while(i < in.size())
    {
        c = c << 1;       //
        if(in.at(i)) ++c; // adding 1 if bit is true
        ++bits;
        if(bits == 8)
        {
            out.put((char)c);
            c = 0;
            bits = 0;
        }
        ++i;
    }
    // dump remaining
    if(bits != 0) {
        // pad the byte so that first bits are in the most significant positions.
        while(bits != 8)
        {
            c = c << 1;
            ++bits;
        }
        out.put((char)c);
    }
    return;
}

template<int I>
void bitset_restore(std::istream &in, std::bitset<I> &out)
{
    // read bytes from the input stream to a bitset of size I.
    /* for debug */ //for(int n = 0; n < I; ++n) out.at(n) = false;
    unsigned int i = 0;          // current bit index
    unsigned char mask = 0x80;   // current byte mask
    unsigned char c = 0;         // current byte in stream
    while(in.good() && (i < I))
    {
        if((i%8) == 0)           // retrieve next character
        { c = in.get();
          mask = 0x80;
        }
        else mask = mask >> 1;   // shift mask
        out.at(i) = (c & mask);
        ++i;
    }
}

Note that probably using a reinterpret_cast of the portion of memory used by the bitset as an array of chars could also work, but it is maybe not portable accross systems because you don't know what the representation of the bitset is (endianness?)


How about this

#include <sys/time.h>
#include <unistd.h>

#include <algorithm>
#include <fstream>
#include <vector>

...
{
  std::srand(std::time(nullptr));
  std::vector<bool> vct1, vct2;
  vct1.resize(20000000, false);
  vct2.resize(20000000, false);
  // insert some data
  for (size_t i = 0; i < 1000000; i++) {
    vct1[std::rand() % 20000000] = true;
  }
  
  // serialize to file
  std::ofstream ofs("bitset", std::ios::out | std::ios::trunc);
  for (uint32_t i = 0; i < vct1.size(); i += std::_S_word_bit) {
    auto vct1_iter = vct1.begin();
    vct1_iter += i;
    uint32_t block_num = i / std::_S_word_bit;
    std::_Bit_type block_val = *(vct1_iter._M_p);
    if (block_val != 0) {
      // only write not-zero block
      ofs.write(reinterpret_cast<char*>(&block_num), sizeof(uint32_t));
      ofs.write(reinterpret_cast<char*>(&block_val), sizeof(std::_Bit_type));
    }
  }
  ofs.close();

  // deserialize
  std::ifstream ifs("bitset", std::ios::in);
  ifs.seekg(0, std::ios::end);
  uint64_t file_size = ifs.tellg();
  ifs.seekg(0);
  uint64_t load_size = 0;
  while (load_size < file_size) {
    uint32_t block_num;
    ifs.read(reinterpret_cast<char*>(&block_num), sizeof(uint32_t));
    std::_Bit_type block_value;
    ifs.read(reinterpret_cast<char*>(&block_value), sizeof(std::_Bit_type));
    load_size += sizeof(uint32_t) + sizeof(std::_Bit_type);
    auto offset = block_num * std::_S_word_bit;
    if (offset >= vct2.size()) {
      std::cout << "error! already touch end" << std::endl;
      break;
    }
    auto iter = vct2.begin();
    iter += offset;
    *(iter._M_p) = block_value;
  }
  ifs.close();

  // check result
  int count_true1 = std::count(vct1.begin(), vct1.end(), true);
  int count_true2 = std::count(vct2.begin(), vct2.end(), true);
  std::cout << "count_true1: " << count_true1 << " count_true2: " << count_true2 << std::endl;

}


One way might be:

std::vector<bool> data = /* obtain bits somehow */

// Reserve an appropriate number of byte-sized buckets.
std::vector<char> bytes((int)std::ceil((float)data.size() / CHAR_BITS)); 

for(int byteIndex = 0; byteIndex < bytes.size(); ++byteIndex) {
   for(int bitIndex = 0; bitIndex < CHAR_BITS; ++bitIndex) {
       int bit = data[byteIndex * CHAR_BITS + bitIndex];

       bytes[byteIndex] |= bit << bitIndex;
   }
}

Note that this assumes you don't care what the bit layout ends up being in memory, because it makes no adjustments for anything. But as long as you also serialize out the number of bits that were actually stored (to cover cases where you have a bit count that isn't a multiple of CHAR_BITS) you can deserialize exactly the same bitset or vector as you had originally like this.

(I'm not happy with that bucket size computation but it's 1am and I'm having trouble thinking of something more elegant).


#include "stdio"
#include "bitset"
...
FILE* pFile;
pFile = fopen("output.dat", "wb");
...
const unsigned int size = 1024;
bitset<size> bitbuffer;
...
fwrite (&bitbuffer, 1, size/8, pFile);
fclose(pFile);


Two options:

Spend the extra pounds (or pence, more likely) for a bigger disk.

Write a routine to extract 8 bits from the bitset at a time, compose them into bytes, and write them to your output stream.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜