开发者

Regular expression to match characters in a string, excluding matches within HTML anchor elements

Consid开发者_运维技巧er this blob of text:

@"
I want to match  the word 'highlight' in a string. But I don't want to match
highlight when it is contained in an HTML anchor element. The expression
should not match highlight in the following text: <a href='#'>highlight</a>
"

Here's what the output should look like (matches are in bold):

I want to match the word "highlight" in a string. But I don't want to match highlight when it is contained in an HTML anchor element. The expression should not match highlight in the following text: highlight

How would you construct an expression that matches all occurrences of X, excluding matches inside HTML anchor elements?


I know you asked for RegEx, but I won't do it. Instead here's a solution using Html Agility Pack.

public static void Parse()
{
    string htmlFragment =
        @"
    I want to match  the word 'highlight' in a string. But I don't want to match
    highlight when it is contained in an HTML anchor element. The expression
    should not match highlight in the following text: <a href='#'>highlight</a> more
    ";
    HtmlDocument htmlDocument = new HtmlAgilityPack.HtmlDocument();
    htmlDocument.LoadHtml(htmlFragment);
    foreach (HtmlNode node in htmlDocument.DocumentNode.SelectNodes("//.").Where(FilterTextNodes()))
    {
        Console.WriteLine(node.OuterHtml);
    }
}

private static Func<HtmlNode, bool> FilterTextNodes()
{
    return node => node.NodeType == HtmlNodeType.Text && node.ParentNode != null && node.ParentNode.Name != "a" && node.OuterHtml.Contains("highlight");
}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜