开发者

Performance of string streams versus file I/O streams in C++

I have to read in a huge text file (>200,000 words) and process开发者_高级运维 each word. I read in the entire file into a string and then attach a string stream to it to process easily each word. The approach is I directly input each word from file using << and process it but comparing both the approaches does not give me any advantage in terms of execution time. Isn't it faster to operate on a string in memory than from a file which needs a system call every time I need a word? Please suggest some performance enhancing methods.


For performance and minimal copying, this is hard to beat (as long as you have enough memory!):

void mapped(const char* fname)
{
  using namespace boost::interprocess;

  //Create a file mapping
  file_mapping m_file(fname, read_only);

  //Map the whole file with read permissions
  mapped_region region(m_file, read_only);

  //Get the address of the mapped region
  void * addr       = region.get_address();
  std::size_t size  = region.get_size();

  // Now you have the underlying data...
  char *data = static_cast<char*>(addr);

  std::stringstream localStream;
  localStream.rdbuf()->pubsetbuf(data, size);

  // now you can do your stuff with the stream
  // alternatively
}


If you're going to put the data into a stringstream anyway, it's probably a bit faster and easier to copy directly from the input stream to the string stream:

std::ifstream infile("yourfile.txt");
std::stringstream buffer;

buffer << infile.rdbuf();

The ifstream will use a buffer, however, so while that's probably faster than reading into a string, then creating a stringstream, it may not be any faster than working directly from the input stream.


There is caching involved, so it does not necessarily do a system call each time you extract. Having said that, you may get marginally better performance at parse time by parsing a single contiguous buffer. On the other hand, you are serializing the workload (read entire file, then parse), which can potentially be parallelized (read and parse in parallel).


The string will get reallocated and copied an awful lot of times to accommodate 200,000 words. That's probably what is taking the time.

You should use a rope if you want to create a huge string by appending.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜