开发者

PHP Regexp: Subpattern that might occur more than once

I'm trying to write a regular expression for html code that looks like this:

<tr>
开发者_JAVA技巧    <td>I'm some text</td>
    <td>1234</td>
    <td>1231</td>
</tr>
<tr>
    <td>I'm some text</td>
    <td>1234</td>
    <td>1231</td>
    <td>7181</td>
</tr>

Now I want an expression that looks for every table row and can handle dynamic numbers of ([0-9]{4}). So if there are two cells, I'd like to get an array with the two values, if there are three, there should be all three values inside my array.

My regexp HAS TO start and end with:

!<tr> ..... </tr>!sU

Is that possible?


this should help you get started

$html = ...as above
preg_match_all('~<tr>.+?(\d+).+?</tr>~si', $html, $matches);
print_r($matches);


Now I want an expression that looks for every table row and can handle dynamic numbers of ([0-9]{4}). So if there are two cells, I'd like to get an array with the two values, if there are three, there should be all three values inside my array. (...) Is that possible?

No, it's not. You cannot write a pattern with a dynamic number of sub-patterns.

My regexp HAS TO start and end with:
!<tr> ..... </tr>!sU

Why is that?

If you really want to use regular expressions instead of using a XML parser or something more forgiving like Tidy, I suggest a two-step approach.

First step: Find <tr> rows:

!<tr>(.*?)</tr>!

Second step: Iterate over the results and look for <td>s:

!<td>(?:<[^>]+>)*(\d{4})(?:<[^>]+>)*</td>!

This will find sequences of 4 decimal characters (0-9) within <td> and also matches nested formatting tags like

<td><strong>1234</strong></td>


regexp is notoriously bad at evaluating hierarchical structures and especially so with xml. You are much better off using SimpleXML, or DOMDocument with DOMXPath

See http://www.php.net/manual/en/simplexmlelement.xpath.php for how to use Xpath with SimpleXML

and

http://www.php.net/manual/en/domxpath.evaluate.php for how it can be done with DOMXPath.

Note that if your case is as simple as given in the question, then SimpleXML is the better choice. There are some cases where DOMDocument would be more appropriate so it'd be good to have more info for that decision

For example:

<?php
$string = <<<XML
<table>
  <tr>
    <td>I'm some text</td>
    <td>1234</td>
    <td>1231</td>
  </tr>
  <tr>
    <td>I'm some text</td>
    <td>1234</td>
    <td>1231</td>
    <td>7181</td>
  </tr>
</table>
XML;

$xml = new SimpleXMLElement($string);

/* Search for <a><b><c> */
$result = $xml->xpath('//tr/td[text() = number(text())');

while(list( , $node) = each($result)) {
    echo $node,"\n";
}

?>
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜