Regular expression to match characters in a string, excluding matches within HTML anchor elements
Consid开发者_运维技巧er this blob of text:
@"
I want to match the word 'highlight' in a string. But I don't want to match
highlight when it is contained in an HTML anchor element. The expression
should not match highlight in the following text: <a href='#'>highlight</a>
"
Here's what the output should look like (matches are in bold):
I want to match the word "highlight" in a string. But I don't want to match highlight when it is contained in an HTML anchor element. The expression should not match highlight in the following text: highlight
How would you construct an expression that matches all occurrences of X, excluding matches inside HTML anchor elements?
I know you asked for RegEx, but I won't do it. Instead here's a solution using Html Agility Pack.
public static void Parse()
{
string htmlFragment =
@"
I want to match the word 'highlight' in a string. But I don't want to match
highlight when it is contained in an HTML anchor element. The expression
should not match highlight in the following text: <a href='#'>highlight</a> more
";
HtmlDocument htmlDocument = new HtmlAgilityPack.HtmlDocument();
htmlDocument.LoadHtml(htmlFragment);
foreach (HtmlNode node in htmlDocument.DocumentNode.SelectNodes("//.").Where(FilterTextNodes()))
{
Console.WriteLine(node.OuterHtml);
}
}
private static Func<HtmlNode, bool> FilterTextNodes()
{
return node => node.NodeType == HtmlNodeType.Text && node.ParentNode != null && node.ParentNode.Name != "a" && node.OuterHtml.Contains("highlight");
}
精彩评论