Regular expression to match word instances not in html attrs or link text
I want to metch a keyword that is not linked, as the following example shows, I just match the google keyword that is n开发者_开发问答either between <a></a> nor included in the attributes, I only want to match the last google:
<a href="http://www.google.com" title="google">google</a> is linked, google is not linked.
Do not parse HTML with regular expressions. HTML is an irregular language. Use a HTML parser.
This works for me (javascript):
var matches = str.match(/(?:<a[^>]*>[^<]*<\/a>[\s\S]*)*(google)/);
See it in action
Provided you can be sure that your HTML is well behaved (and valid), especially does not contain comments or nested a
tags, you can try
google(?!((?!<a[\s>]).)*</a>)
That matches any "google" that is not followed by a closing a
tag before the next opening a
tag. But you might be better of using a HTML Parser instead.
精彩评论