开发者

encode html in Asp.net C# but leave tags intact

I need t开发者_运维百科o encode a whole text while leaving the < and > intact.

example

<p>Give me 100.000 €!</p>

must become:

<p>Give me 100.000 &euro;!</p>

the html tags must remain intact


Use a regular expression that matches either a tag or what's between tags, and encode what's between:

html = Regex.Replace(
  html,
  "(<[^>]+>|[^<]+)",
  m => m.Value.StartsWith("<") ? m.Value : HttpUtility.HtmlEncode(m.Value)
);


you might go for Html Agility Pack and then encode the values of the tags


Maybe use string.replace for just those characters you want to encode?


You could use HtmlTextWriter in addition to htmlencode. So you would use HtmlTextWriter to setup your <p></p> and then just set the body of the <p></p> using HtmlEncode. HtmlTextWriter allow ToString(); and a bunch of other methods so it shouldn't be much more code.


As others have suggested, this can be achieved with HtmlAgilityPack.

 public static class HtmlTextEncoder
 {
    public static string HtmlEncode(string html)
    {
        if (html == null) return null;

        var doc = new HtmlDocument();
        doc.LoadHtml(html);

        EncodeNode(doc.DocumentNode);

        doc.OptionWriteEmptyNodes = true;
        using (var s = new MemoryStream())
        {
            doc.Save(s);
            var encoded = doc.Encoding.GetString(s.ToArray());
            return encoded;
        }
    }

    private static void EncodeNode(HtmlNode node)
    {
        if (node.HasChildNodes)
        {
            foreach (var childNode in node.ChildNodes)
            {
                if (childNode.NodeType == HtmlNodeType.Text)
                {
                    childNode.InnerHtml = HttpUtility.HtmlEncode(childNode.InnerHtml);
                }
                else
                {
                    EncodeNode(childNode);
                }
            }
        }
        else if (node.NodeType == HtmlNodeType.Text)
        {
            node.InnerHtml = HttpUtility.HtmlEncode(node.InnerHtml);
        }
    }
}

This iterates through all the nodes in the HTML, and replaces any text nodes with HTML encoded text.

I've created a .NET fiddle to demonstrate this technique.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜