开发者

Fastest way to write large STL vector to file using STL

I have a large vector (10^9 elements) of chars, and I was wondering what is the fastest wa开发者_开发问答y to write such vector to a file. So far I've been using next code:

vector<char> vs;
// ... Fill vector with data
ofstream outfile("nanocube.txt", ios::out | ios::binary);
ostream_iterator<char> oi(outfile, '\0');
copy(vs.begin(), vs.end(), oi);

For this code it takes approximately two minutes to write all data to file. The actual question is: "Can I make it faster using STL and how"?


With such a large amount of data to be written (~1GB), you should write to the output stream directly, rather than using an output iterator. Since the data in a vector is stored contiguously, this will work and should be much faster.

ofstream outfile("nanocube.txt", ios::out | ios::binary);
outfile.write(&vs[0], vs.size());


There is a slight conceptual error with your second argument to ostream_iterator's constructor. It should be NULL pointer, if you don't want a delimiter (although, luckily for you, this will be treated as such implicitly), or the second argument should be omitted.

However, this means that after writing each character, the code needs to check for the pointer designating the delimiter (which might be somewhat inefficient).

I think, if you want to go with iterators, perhaps you could try ostreambuf_iterator.

Other options might include using the write() method (if it can handle output this large, or perhaps output it in chunks), and perhaps OS-specific output functions.


Since your data is contiguous in memory (as Charles said), you can use low level I/O. On Unix or Linux, you can do your write to a file descriptor. On Windows XP, use file handles. (It's a little trickier on XP, but well documented in MSDN.)

XP is a little funny about buffering. If you write a 1GB block to a handle, it will be slower than if you break the write up into smaller transfer sizes (in a loop). I've found the 256KB writes are most efficient. Once you've written the loop, you can play around with this and see what's the fastest transfer size.


OK, I did write method implementation with for loop that writes 256KB blocks (as Rob suggested) of data at each iteration and result is 16 seconds, so problem solved. This is my humble implementation so feel free to comment:

 void writeCubeToFile(const vector<char> &vs)
 {
     const unsigned int blocksize = 262144;
     unsigned long blocks = distance(vs.begin(), vs.end()) / blocksize;

     ofstream outfile("nanocube.txt", ios::out | ios::binary);

     for(unsigned long i = 0; i <= blocks; i++)
     {
         unsigned long position = blocksize * i;

         if(blocksize > distance(vs.begin() + position, vs.end())) outfile.write(&*(vs.begin() + position), distance(vs.begin() + position, vs.end()));
         else outfile.write(&*(vs.begin() + position), blocksize);
     }

     outfile.write("\0", 1);

     outfile.close();
}

Thnx to all of you.


If you have other structure this method is still valid.

For example:

typedef std::pair<int,int> STL_Edge;
vector<STL_Edge> v;

void write_file(const char * path){
   ofstream outfile(path, ios::out | ios::binary);
   outfile.write((const char *)&v.front(), v.size()*sizeof(STL_Edge));
}

void read_file(const char * path,int reserveSpaceForEntries){
   ifstream infile(path, ios::in | ios::binary);
   v.resize(reserveSpaceForEntries);
   infile.read((char *)&v.front(), v.size()*sizeof(STL_Edge));
}


Instead of writing via the file i/o methods, you could try to create a memory-mapped file, and then copy the vector to the memory-mapped file using memcpy.


Use the write method on it, it is in ram after all and you have contigous memory.. Fastest, while looking for flexibility later? Lose the built-in buffering, hint sequential i/o, lose the hidden things of iterator/utility, avoid streambuf when you can but do get dirty with boost::asio ..

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜