regular expression to match html tag with specific contents
I am trying to write a regular expression to capture this string:
<td style="white-space:nowrap;">###.##</td>
I can't even match it if include the string as it is in the regex pattern!
I am using preg_match_all()
, however, I am not finding the correct pattern. I am thinking that "white-space:nowrap;"
is throwing off the matching in some way. Any idea? Thanks 开发者_如何学编程...
Why not try using DOM document instead? Then you do not have to worry about having the HTML formatted properly. Using the Dom Doc collection will also improve readability and ensure fast performance since its part of the PHP Core rather then living in user space
When I'm having problems with regular expressions, I like to test them in real time with one of the following websites:
- preg_match Regular Expression Tester
- Regular Expression Test Tool
Did you see any warnings? You have to escape some bits of that, namely the /
before the td close tag. This seemed to work for me:
$string='cow cow cow <td style="white-space:nowrap;">###.##</td> cat cat cat cat';
php > preg_match_all('/<td style="white-space:nowrap;">###\.##<\/td>/',$string,$result);
php > var_dump($result);
array(1) {
[0]=>
array(1) {
[0]=>
string(43) "<td style="white-space:nowrap;">###.##</td>"
}
}
Are you aware that the regex argument to any of PHP's preg_
functions has to be double-delimited? For example:
preg_match_all(`'/foo/'`, $target, $results)
'...'
are the string delimiters, /.../
are the regex delimiters, and the actual regex is foo
. The regex delimiters don't have to be slashes, they just have to match; some popular choices are #...#
, %...%
and ~...~
. They can also be balanced pairs of bracketing characters, like {...}
, (...)
, [...]
, and <...>
; those are much less popular, and for good reason.
If you leave out the regex delimiters, the regex-compilation phase will probably fail and the error message will probably make no sense. For example, this code:
preg_match_all('<td style="white-space:nowrap;">###.##</td>', $s, $m)
...would generate this message:
Unknown modifier '#'
It tries to use the first pair of angle brackets as the regex delimiters, and whatever follows the >
as the regex modifiers (e.g., i
for case-insensitive, m
for multiline). To fix that, you would add real regex delimiters, like so:
preg_match_all('%<td style="white-space:nowrap;">###\.##</td>%i', $s, $m)
The choice of delimiter is a matter of personal preference and convenience. If I had used #
or /
, I would have had to escape those characters in the actual regex. I escaped the .
because it's a regex metacharacter. Finally, I added the i
modifier to demonstrate the use of modifiers and because HTML isn't case sensitive.
精彩评论