开发者

Regular expression to match word instances not in html attrs or link text

I want to metch a keyword that is not linked, as the following example shows, I just match the google keyword that is n开发者_开发问答either between <a></a> nor included in the attributes, I only want to match the last google:

<a href="http://www.google.com" title="google">google</a> is linked, google is not linked.


Do not parse HTML with regular expressions. HTML is an irregular language. Use a HTML parser.


This works for me (javascript):

var matches = str.match(/(?:<a[^>]*>[^<]*<\/a>[\s\S]*)*(google)/);

See it in action


Provided you can be sure that your HTML is well behaved (and valid), especially does not contain comments or nested a tags, you can try

google(?!((?!<a[\s>]).)*</a>)

That matches any "google" that is not followed by a closing a tag before the next opening a tag. But you might be better of using a HTML Parser instead.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜