开发者

My version of C++ non-member getline(), that takes a FILE* (created by _wfopen()) instead of a stream, is too slow

In C++, you can use non-member getline() with a stream in a loop like this:

#include <string>
#include <fstream>
#include <cstdlib>
using namespace std;

int main() {
    ifstream in("file.txt");
    if (!in) {
        return EXIT_FAILURE;
    }
    for (string line; getline(in, line); ) {
        // Do stuff with each line
    }
}

However, I want to do that with a FILE* created by _wfopen("file.txt", "r") instead, so I created one:

#include <cstdio>
#include <string>
#include <cstdlib>
#include <cwchar>
using namespace std;

bool getline(FILE* const in, string& s) {
    int c = fgetc(in);
    if (c == EOF) {
        return false;
    }
    s.clear();
    while (c != EOF && c != 10 && c != 13) {
        s += c;
        c = fgetc(in);
    }
    return true;
}

int main() {
    FILE* const in = _wfopen(L"file.txt", L"r");
    if (!in) {
        return EXIT_FAILURE;
    }
    for (string line; getline(in, line); ) {
        // Do stuff with the line
    }
    if (in) {
        fclose(in);
    }
}

It handles newlines like I want and works in a loop like I want. It's just too slow because I'm reading one char at a time and inserting one char in the string at a time. For example, it takes 6 seconds to process a 12MB file while the original getline does it virtually instantly. That's not that big of a deal for a small file, but for a 2GB file for example, it'd be a pr开发者_如何学运维oblem.

I'd like it to be as fast as C++'s getline(), but I don't think I can make it any faster without redesigning it.

So, how should I redesign it so it's more efficient?

I know I should fread in chunks into a buffer (a vector for example and resize when needed) till I find() a newline or newline pair in it and append the range to the string. However, I'm not really picturing how to make it work like my char-by-char version, especially if I read in too much and have to put data after the newline or newline pair back into the stream so it can be consumed on the next iteration.

Now, VC++ has a wifstream that takes a FILE* and STLPort might have one too. But, I'm using just Mingw 4.4.1. (I don't want to use STLPort because it's a pain in the ass to build with Mingw.)

The reason I need to use a FILE* is because that's what _wfopen() returns. I need to use _wfopen() because it supports wchar_t* paths that I will be getting from the wchar_t** array returned by windows function CommandLineToArgvW(CommandLineW(), &argc). ifstream doesn't take a wide path.

Thanks


You should be using C++ I/O facilities if you're programming in C++. Having said that...

First, you are checking for newline by checking for 10 and 13. You should open your file in text mode, and check for '\n' instead. This method is portable, and works with different line-end conventions, as well as on non-ASCII systems.

Assuming you have to use native C FILE *, I would do it this way:

#include <cstdio>
#include <cstring>
#include <string>

bool cgetline(FILE* const in, std::string &s)
{
    char buf[BUFSIZ+1] = {0};
    s.clear();
    while (fgets(buf, sizeof buf, in) != NULL) {
        char *end = strchr(buf, '\n');
        if (end == NULL) {
            /* We didn't see a newline at the end of the line,
               if we hit the end of file, then the last line wasn't terminated
               with a newline character.  Return it anyway. */
            if (feof(in)) {
                s.append(buf, strlen(buf));
                return true;
            } else {
                s.append(buf, sizeof buf - 1);
            }
        } else {
            s.append(buf, end - buf);
            return true;
        }
    }
    return false;
}

The complication is from making sure the program does the right thing when the last line of a file doesn't end with a newline character.

Reading from a file character-by-character and appending to a string is probably why your version is slow.


It's possible that your std::string implementation does not grow strings in a way that's efficient for appending many characters one-by-one. One thing to try might be to use std::string::reserve() to double the string capacity when the buffer is full.

Edit: BTW, I should add that if you're expecting to open the FILE* in text mode, you do not need to check against both \n and \r, as newline conversion appropriate for the platform is performed automatically by the C stdio functions in text mode. (If, however, you intend to read files created on other platforms (e.g. reading Windows files on Unix), then you would need to check for the various types of line endings.)

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜