Using preg_match_all to get items from HTML
I have a number of items in a table, formatted like this
<td class="product highlighted">
Item Name
</td>
and I am using the following PHP code
$regex_pattern = "/<td class=\"product highlighted\">(.*)<\/td>/";
preg_match_all($regex_pattern,$buffer,$match开发者_JS百科es);
print_r($matches);
I am not getting any output, yet I can see the items in the html.
Is there something wrong with my regexp?
Apart from your using regex to parse HTML, yes, there is something wrong: The dot doesn't match newlines.
So you need to use
$regex_pattern = "/<td class=\"product highlighted\">(.*?)<\/td>/s";
The /s
modifier allows the dot to match any character, including newlines. Note the reluctant quantifier .*?
to avoid matching more than one tag at once.
In order to match your example, you will need to add the dot all flag, s
, so the .
will match newlines.
Try the following.
$regex_pattern = "/<td class=\"product highlighted\">(.*?)<\/td>/s";
Also note that I changed the capture to non-greedy, (.*?)
. It's best to do so when matching open ended text.
It's worth noting regular expressions are not the right tool for HTML parsing, you should look into DOMDocument. However, for such a simple match you can get away with regular expressions provided your HTML is well-formed.
精彩评论