RegEx : Extract Number out of Source Code
i am no RegEx expert. I need to extract a certain number out of an HTML table.
An example:<td>13</td><td>
</td><td align="right">29.543</td>
<td align="right">1.777</td>
<td align="right">2.588</td>
</开发者_运维技巧tr><tr><td><a href="player.php?p=84668" >Caterdamus</a></td>
<td>7</td><td>
Meister</td><td align="right">9.874</td>
<td align="right">1.716</td>
<td align="right">5.791</td>
</tr><tr><td><a href="player.php?p=87216" >grappa</a></td>
<td>2</td><td>
</td><td align="right">1.044</td>
<td align="right">21</td>
<td align="right">146</td>
</tr></table>
The pattern looks like this :
<td>13</td><td>
<td>7</td><td>
<td>2</td><td>
How do i extract the numbers out of the text and store it into a variable. Hint: the numbers are positive integers.
Thanks:)
I wouldn't use regular expressions to parse HTML or XML. Instead, I would load the document into an HTML DOM parser - you can find several open source ones here. I can't vouch for any of these - I've never worked with anything other than XML in Java.
I don't know java regex exactly but I'ld suggest something like
/<td>(\d+)<\/td><td>/
since syntax of regex is quite similar in multiple languages.
Explanations
(
...)
captures the content inside of the regex's return variables\d
represents a digit+
stays for one or more occurences of the token on it's left side
since you use only positive integers, you don't have to care about signs and decimal points.
<td>(\d+)</td>
should do the job.
精彩评论