开发者

C++, subtract certain strings?

This is a homework, thus I hope you guys don't give me the direct answers/code, but guide me to the solution.

My problem is, I have this XXX.html file, inside have thousands of codes. But what I need is to extract this portion:

<开发者_JS百科;html>
...
<table>
    <thead>
        <tr>
            <th class="xxx">xxx</th>
            <th>xxx</th>                       <th>xxx</th>         </tr>
    </thead>
    <tbody>
        <tr class=xxx>
        <td class="xxx"><a href="xxx" >ZZZ ZZ ZZZ</a></td>
<td>ZZZZ</td>        <td class="xxx">ZZZZ</td>    </tr>    <tr class=xxx>
<td class="xxx"><a href="xxx" >ZZZ ZZ ZZZ</a></td>
<td>ZZZZ</td>        <td class="xxx">ZZZZ</td>    </tr>    <tr class=xxx>
<td class="xxxx"><a href="xxxx" >ZZZ ZZ ZZZ</a></td>
<td>ZZZZ</td>        <td class="xxxx">zzzz</td>    </tr>    <tr class=xxx>
<td class="xxx"><a href="xxxx" >ZZZ ZZ ZZZ</a></td>
    ... and so on

This is my current codes so far:

// after open the file
while(!fileOpened.eof()){
        getline(fileOpened, reader);
        if(reader.find("ZZZ")){
            cout << reader << endl;
        }
    }

The "reader" is a string variable that I want to hold for each line of the HTML file. If the value of ZZZZ, as I need to get live, the value will change, what method should I use instead of using "find" method? (I am really sorry, for not mention this part)

But instead of display the value that I want, it display the some others portion of the html file. Why? Is my method wrong? If my method is wrong, how do I extract the ZZZZZ value?


std::string::find does not return a boolean value. It returns an index into the string where the substring match occurs if it is successful, else it returns std::string::npos.

So you would want to say:

    if (reader.find("ZZZ") != std::string::npos){
        cout << reader << endl;
    }


In general using string matching just won't work to extract values from an HTML file. A proper HTML parser would be required -- they are available for C++ as standard code.

Otherwise I'd suggest using a regex library (boost::regex until C++0x comes out). You'll be able to write better expressions to capture the part of the file you are interested in.

Reading by line probably won't work since an HTML file could be one large line. Outputing then each line you find will simply emit the entire file. Thus try the regexes and look for small sections of the code and output those. The regex library will have a "match all" command (I forgot the exact name).


The skeleton code for reading lines from a file should look like this:

if( !file.good() )
  throw "opening file failed!";

for(;;) {
  std::string line;
  std::getline(file, line);
  if( !file.good() )
    break;
  // reading succeeded, process line
}

if(!file.eof())
  // error before reaching EOF

(That funny looking loop is one that checks for the ending condition in the middle of the loop. There is not such thing in C++, so you have to use an endless loop with a break in the middle.)

However, as I said in a comment to your question, reading HTML code line-by-line isn't necessarily useful, as HTML doesn't rely on specific whitespaces.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜