开发者

Equivalent of a python generator in C++ for buffered reads

Guido Van Rossum demonstrates the simplicity of Python in this article and makes use of this function for buffered reads of a file of unknown length:

def intsfromfile(f):
    while True:
        a = array.array('i')
        a.fromstring(f.read(4000))
        if not a:
            break
        for x in a:
            yield x

I need to do the same thing in C++ for speed reasons! I have many files containing sorted lists of unsigned 64 bit integers that I need to merge. I have found this nice piece o开发者_StackOverflow社区f code for merging vectors.

I am stuck on how to make an ifstream for a file of unknown length present itself as a vector which can be happily iterated over until the end of the file is reached. Any suggestions? Am I barking up the correct tree with an istreambuf_iterator?


In order to disguise an ifstream (or really, any input stream) in a form that acts like an iterator, you want to use the istream_iterator or the istreambuf_iterator template class. The former is useful for files where the formatting is of concern. For example, a file full of whitespace-delimited integers can be read into the vector's iterator range constructor as follows:

#include <fstream>
#include <vector>
#include <iterator> // needed for istream_iterator

using namespace std;

int main(int argc, char** argv)
{
    ifstream infile("my-file.txt");

    // It isn't customary to declare these as standalone variables,
    // but see below for why it's necessary when working with
    // initializing containers.
    istream_iterator<int> infile_begin(infile);
    istream_iterator<int> infile_end;

    vector<int> my_ints(infile_begin, infile_end);

    // You can also do stuff with the istream_iterator objects directly:
    // Careful! If you run this program as is, this won't work because we
    // used up the input stream already with the vector.

    int total = 0;
    while (infile_begin != infile_end) {
        total += *infile_begin;
        ++infile_begin;
    }

    return 0;
}

istreambuf_iterator is used to read through files a single character at a time, disregarding the formatting of the input. That is, it will return you all characters, including spaces, newline characters, and so on. Depending on your application, that may be more appropriate.

Note: Scott Meyers explains in Effective STL why the separate variable declarations for istream_iterator are needed above. Normally, you would do something like this:

ifstream infile("my-file.txt");
vector<int> my_ints(istream_iterator<int>(infile), istream_iterator<int>());

However, C++ actually parses the second line in an incredibly bizarre way. It sees it as the declaration of a function named my_ints that takes in two parameters and returns a vector<int>. The first parameter is of type istream_iterator<int> and is named infile (the parantheses are ignored). The second parameter is a function pointer with no name that takes zero arguments (because of the parantheses) and returns an object of type istream_iterator<int>.

Pretty cool, but also pretty aggravating if you're not watching out for it.


EDIT

Here's an example using the istreambuf_iterator to read in a file of 64-bit numbers laid out end-to-end:

#include <fstream>
#include <vector>
#include <algorithm>
#include <iterator>

using namespace std;

int main(int argc, char** argv)
{
    ifstream input("my-file.txt");
    istreambuf_iterator<char> input_begin(input);
    istreambuf_iterator<char> input_end;

    // Fill a char vector with input file's contents:
    vector<char> char_input(input_begin, input_end);
    input.close();

    // Convert it to an array of unsigned long with a cast:
    unsigned long* converted = reinterpret_cast<unsigned long*>(&char_input[0]);
    size_t num_long_elements = char_input.size() * sizeof(char) / sizeof(unsigned long);

    // Put that information into a vector:
    vector<unsigned long> long_input(converted, converted + num_long_elements);

    return 0;
}

Now, I personally rather dislike this solution (using reinterpret_cast, exposing char_input's array), but I'm not familiar enough with istreambuf_iterator to comfortably use one templatized over 64-bit characters, which would make this much easier.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜