Parse Website Data in C++
So I am trying to develop a program that will parse a website for data, send that data into variable that I can then use for functions inside the program.
Specifically I'm trying to parse this page (Click the debuffs tab)
http://worldoflogs.com/reports/rt-1smdoscr7neq0k6b/spell/94075/
The source is pretty simple and looks like this.
<td><a href='/reports/rt-1smdoscr7neq0k6b/details/62/' class='actor'><span class='Warrior'>Zonnza</span></a></td>
<td>100</td>
</tr>
<tr>
<td><a href='/reports/rt-1smdoscr7neq0k6b/details/3/' class='actor'><span class='DeathKnight'>Fillzholez</span></a></td>
<td>89</td>
</tr>
While I only want the numbers and name, ex what is between <td></td>
and between开发者_开发问答 the <span class=''></span>
tags. Is there anyway to do what I'm looking for?
Any help would be greatly appreciated.
I'd look into Tag Soup. It's a parser for HTML that can cope with all the horrible HTML that's out there. There's a C++ port of it available too (haven't used that so can't comment on how stable it is).
There are no C++ libraries for what you're trying to do (unless you're going to link with a half of Mozilla or WebKit), but you can consider using Java with HTMLUnit.
And for those suggesting regular expressions, an obligatory reference.
There's no need to use C++, when C-style sscanf will do, or even perl
or any language with regular expression support.
精彩评论