开发者

How to parse text-based table in C++

I am trying to parse a table in the form of a text file using ifstream, and evaluating/manipulating each entry. However, I'm having trouble figuring out how to approach this because of omissions of particular items. Consider the following table:

NEW  VER  ID   NAME
1    2a   4    "ITEM ONE" (2001)
     1    7    "2 ITEM" (2002) {OCT}
     1.1  10   "SOME ITEM 3" (2003)
1         12   "DIFFERENT ITEM 4" (2004)
1    a4   16   "ITEM5" (2005) {DEC}

As you can see, sometimes the "NEW" column has nothing in it. What I want to do is take note of the ID, the name, the year (in brackets), and note whether there are braces or not afterwards.

When I started doing this, I looked for a "split" function, but I realized that it would be a bit more complicated because of the aforementioned missing items and the titles becoming separated.

The one thing I can think of is reading each line word by word, keeping track of the latest number I saw. Once I hit a quotation mark, make note that the latest number I saw was an ID (if I used something like a split, the array position right before the quotation mark), then keep record of everything until the next quote (the title), then finally, start looking for brackets and braces for the other information. However, this seems really primitive and I'm looking for a better way to do this.

I'm doing this to sharpen开发者_JS百科 my C++ skills and work with larger, existing datasets, so I'd like to use C++ if possible, but if another language (I'm looking at Perl or Python) makes this trivially easy, I could just learn how to interface a different language with C++. What I'm trying to do now is just sifting data anyways which will eventually become objects in C++, so I still have chances to improve my C++ skills.

EDIT: I also realize that this is possible to complete using only regex, but I'd like to try using different methods of file/string manipulation if possible.


If the column offsets are truly fixed (no tabs, just true space chars a la 0x20) I would read it a line at a time (string::getline) and break it down using the fixed offsets into a set of four strings (string::substr).

Then postprocess each 4-tuple of strings as required.

I would not hard-code the offsets, store them in a separate input file that describes the format of the input - like a table description in SQL Server or other DB.


Something like this:

  1. Read the first line, find "ID", and store the index.
  2. Read each data line using std::getline().
  3. Create a substring from a data line, starting at the index you found "ID" in the header line. Use this to initialize a std::istringstream with.
  4. Read the ID using iss >> an_int.
  5. Search the first ". Search the second ". Search the ( and remember its index. Search the ) and remember that index, too. Create a substring from the characters in between those indexes and use it to initialize another std::istringstream with. Read the number from this stream.
  6. Search for the braces.
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜