开发者

RegEx : Extract Number out of Source Code

i am no RegEx expert. I need to extract a certain number out of an HTML table.

An example:

<td>13</td><td>
  </td><td align="right">29.543</td>
  <td align="right">1.777</td>
  <td align="right">2.588</td>
</开发者_运维技巧tr><tr><td><a href="player.php?p=84668" >Caterdamus</a></td>
  <td>7</td><td>
  Meister</td><td align="right">9.874</td>
  <td align="right">1.716</td>
  <td align="right">5.791</td>
</tr><tr><td><a href="player.php?p=87216" >grappa</a></td>
  <td>2</td><td>
  </td><td align="right">1.044</td>
  <td align="right">21</td>
  <td align="right">146</td>
</tr></table>

The pattern looks like this :

<td>13</td><td>
<td>7</td><td>
<td>2</td><td>

How do i extract the numbers out of the text and store it into a variable. Hint: the numbers are positive integers.

Thanks:)


I wouldn't use regular expressions to parse HTML or XML. Instead, I would load the document into an HTML DOM parser - you can find several open source ones here. I can't vouch for any of these - I've never worked with anything other than XML in Java.


I don't know java regex exactly but I'ld suggest something like

/<td>(\d+)<\/td><td>/

since syntax of regex is quite similar in multiple languages.

Explanations

  • ( ... ) captures the content inside of the regex's return variables
  • \d represents a digit
  • + stays for one or more occurences of the token on it's left side

since you use only positive integers, you don't have to care about signs and decimal points.


<td>(\d+)</td>

should do the job.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜