Regex matching table rows in HTML [duplicate]
Possible Duplicate:
Best methods to parse HTML with PHP
I'm having a bit of trouble matching table rows with preg. Here is my expression:
<TR[a-z\=\"a-z0-9 ]*>([\{\}\(\)\^\=\$\&\.\_\%\#\!\@\=\<\>\:\;\,\开发者_开发问答~\`\'\*\?\/\+\|\[\]\|\-a-zA-Z0-9À-ÿ\n\r ]*)<\/TR>
As you can see, it tries to mach everything in-between TR tags (including all symbols.) That part works great, however when dealing with multiple table rows, it often takes multiple table rows as ONE match, rather than a match for each table row:
<TR>
<TD>test</TD>
</TR>
<TR>
<TD>test2</TD>
</TR>
yields:
Array
(
[0] => <TD>test</TD>
<TD>test2</TD>
)
rather than what I want it to:
Array
(
[0] => <TD>test</TD>
[1] => <TD>test2</TD>
)
I realize that the reason for this is because it's match the symbols, and the search naturally takes the rest of the rows until it hits the last one.
So basically, I'm wondering if someone can help me add to the expression so that it will exclude anything with "TR" in between the TR tags, as to prevent it from matching multiple rows.
Use lazy matching in your regex: <tr.*?</tr>
But as others have mentioned, it's more robust to use a proper parser if you can.
Try using global search:
preg_match_all("/<td>([^<]+)/", $html, $matches);
精彩评论