Regexp matching mismatched html
How do I parse a certain link style out of html without it spreading across multiple lin开发者_StackOverflow中文版ks to match?
The exact link I am trying to match is:
href="http://www.hotmail.com' rel='external nofollow"
Pay particular attention to the mismatching of ' and " in the above.
What I have tried:
if(preg_match('|href="http(.*?)\' rel=\'(.*?)"|i', $html)){
echo "Found bad html\n";
}
However that regexp is also matching in perfectly good html across several links. I need to be able to only match within a single link.
You might be able to adapt your regex by replacing the generic .*?
with a negative character class like [^<"'>]+
. That usually prevents that it eats up too much.
if(preg_match('| href="(http[^<"\'>]+)\' rel=\'([^<"\'>]+)"|i', $html)){
Better yet: don't hard-code the "
and '
, but use a character class to match them too:
if(preg_match('| href=["\']http([^<"\'>]+)["\']'
.' rel=["\']([^<"\'>]*)["\']|i', $html)){
(Oh, now it looks really ugly.)
精彩评论