encode html in Asp.net C# but leave tags intact
I need t开发者_运维百科o encode a whole text while leaving the < and > intact.
example
<p>Give me 100.000 €!</p>
must become:
<p>Give me 100.000 €!</p>
the html tags must remain intact
Use a regular expression that matches either a tag or what's between tags, and encode what's between:
html = Regex.Replace(
html,
"(<[^>]+>|[^<]+)",
m => m.Value.StartsWith("<") ? m.Value : HttpUtility.HtmlEncode(m.Value)
);
you might go for Html Agility Pack and then encode the values of the tags
Maybe use string.replace for just those characters you want to encode?
You could use HtmlTextWriter in addition to htmlencode. So you would use HtmlTextWriter to setup your <p></p>
and then just set the body of the <p></p>
using HtmlEncode. HtmlTextWriter allow ToString(); and a bunch of other methods so it shouldn't be much more code.
As others have suggested, this can be achieved with HtmlAgilityPack.
public static class HtmlTextEncoder
{
public static string HtmlEncode(string html)
{
if (html == null) return null;
var doc = new HtmlDocument();
doc.LoadHtml(html);
EncodeNode(doc.DocumentNode);
doc.OptionWriteEmptyNodes = true;
using (var s = new MemoryStream())
{
doc.Save(s);
var encoded = doc.Encoding.GetString(s.ToArray());
return encoded;
}
}
private static void EncodeNode(HtmlNode node)
{
if (node.HasChildNodes)
{
foreach (var childNode in node.ChildNodes)
{
if (childNode.NodeType == HtmlNodeType.Text)
{
childNode.InnerHtml = HttpUtility.HtmlEncode(childNode.InnerHtml);
}
else
{
EncodeNode(childNode);
}
}
}
else if (node.NodeType == HtmlNodeType.Text)
{
node.InnerHtml = HttpUtility.HtmlEncode(node.InnerHtml);
}
}
}
This iterates through all the nodes in the HTML, and replaces any text nodes with HTML encoded text.
I've created a .NET fiddle to demonstrate this technique.
精彩评论