开发者

c++ reading formated tabulated data from file using BOOST / STL/ etc

What is the best way for me to create a formated data reading/writing function for开发者_StackOverflow a tabulated text file. Say some calls like :

readElement(i,j)

insertRow(elem[])

readColHeaders()

I was wondering if any existing wrapper can do that ?

internal format is tab-spaced data OR CSV.

Thnx-Egon


There are lots of csv readers but I never found something nice.

The easiest is to use boost::tokenize to fill a vector<vector<string> > from your file. A fancier way is to use boost::spirit (but the learning curve is a rollercoaster).

To generate sur a file, iterating on a vector<vector<string> > is quite trivial.


There is no "standard" csv reader/writer for C or C++. That doesn't mean you can't find some preexisting library code to use, but there is no one library to rule them all. In my job we make heavy use of csv files so I went ahead and rolled my own so as to have as fit for my workflow as possible. I can tell you some of the things I've done in my library that have worked out reasonably well, should you want to also do your own thing:

  • I keep the data as a vector of vectors of boost::any. I let the user specify what the format of the data is in the constructor, similar to how you would pass a format to scanf. This keeps the user from having to do their own casts. I use boost::tokenize and boost::lexical_cast to the actual splitting and casting. This obviously won't work well if your csv files can't fit in memory, but that is rarely a problem for me.

  • I can have a templated get() that does the any_cast and returns the correct data.

  • I have a hash of column names to their index so as to support look ups by column name rather than just positional look ups

  • I allow the user to specify a "primary key" of some combination of columns and then keep a hash such that for each row you have a mapping of the values in the key -> row number. For instance, if you are reading equities data, you might want to find the row based on the CUSIP or ticker, rather than interate over the entire data to find your row.

  • Let the user specify a size hint so you can reserve() in your storage

  • Let the user specify callback functions so that he can process and filter lines he doesn't want as you read/write them

  • Allow the user to specify if the file needs to be locked when reading/writing

  • Allow the user to pass in his own column header for files that do not have the header in the file

Not to get into a language debate, but this library was really a port of something I originally did in perl, and damned if it wasn't 10 times easier to write and 10 times more user friendly to use in perl. I don't recommend doing csv processing in C++ if you can help it.


To read a tab-delimited table into a vector of string vectors...

#include <vector>
#include <string>
#include <sstream>
#include <iostream>

typedef std::vector<std::string> StringVec;
typedef std::vector<StringVec> RowVec;

RowVec readRows(std::istream& f) {
    std::string line;
    RowVec rows;
    while (std::getline(f, line)) {
        rows.push_back(StringVec());
        std::string entry;
        std::istringstream linestrm(line);
        while (std::getline(linestrm, entry, '\t')) {
            rows.back().push_back(entry);
        }
    }
    return rows;
}

int main() {
    std::istringstream textFile("a\tb\tc\n1\t2\t3");
    RowVec rows = readRows(textFile);
    std::cout << rows.size() << std::endl;
    std::cout << rows[0][0] << std::endl;
    std::cout << rows[1][2] << std::endl;
    return 0;
}


If your data is small (e.g. less than a few hundred megabytes), I would read the entire file into memory. For this you can store it in a string matrix like boost::numeric::ublas::matrix<std::string> or a vector of vectors like std::vector<std::vector<std::string> >

Boost.Spirit gives a very nice way to parse this type of text data into these structures. This boils down to a parse command like:

boost::spirit::qi::phrase_parse(
    begin,
    end, 
    // parse rule:

        *(char_ - '\t') % '\t' 

    // end parse rule
    space,
    vec);`

More spirit examples here: http://www.boost.org/doc/libs/1_46_0/libs/spirit/doc/html/spirit/qi/tutorials.html

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜