开发者

Regular expression syntax problem

$pattern='`<a\s+[^>]*(href=([\'\"]).*\\2)[^>]*>([^<]*)<开发者_开发技巧/a>`isU';

And I want to change ([^<]*) this to search for </a> not only < cause <img> tag could be inside <a> tag.

Can anyone help, I'm lousy at regex.


You can use a PHP parser to do this. I wouldn't use Regex at all.

You can try: http://simplehtmldom.sourceforge.net/

Although I think PHP has a DOM parser built in.


Changing ([^<]*)to a ungreedy match all (.*?) might do the trick


([^<]*) could be changed to ((?:[^<]|<(?!/a>))*), which uses a negative lookahead to match non-< characters or < characters which are not followed by /a>. See it in action here.

HOWEVER, as stated many times over already, this is not a good way to parse HTML. Firstly, it's horribly inefficient, and secondly, what happens if you have nested tags, such as <a><a></a></a>? While this may not happen with hyperlinks, it's common among many other HTML elements.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜