开发者

How to HtmlEncode only text content in an HTML string?

I have some HTML content that I'd like to parse and encode before displaying it in my web pages.

The trick is that I开发者_运维知识库 want to encode only text content, not the obvious HTML tags in the HTML content. How can I achieve that?

Example:

Provided

"Some text & links : <strong>bla blà blö</strong> and <a href="http://www.google.com">go there</a> for only 15 € < 20 €"

I'd like to output

"Some text &amp; links : <strong>bla bl&agrave; bl&ouml;</strong> and <a href="http://www.google.com">go there</a> for only 15 &euro; &lt; 20 &euro;"
or
"Some text &#38; links : <strong>bla bl&#224; bl&#246;</strong> and <a href="http://www.google.com">go there</a> for only 15 &#8364; &#60; 20 &#8364;"


Use Html Agility Pack:

var html = 
  "Some text & links : <strong>bla blà blö</strong> and <a href=\"http://www.google.com\">go there</a> for only 15 € < 20 €";

// This
HtmlAgilityPack.HtmlEntity.Entitize(html);

// Outputs
Some text & links : <strong>bla bl&agrave; bl&ouml;</strong> and <a href="http://www.google.com">go there</a> for only 15 &euro; < 20 &euro;

Just tested it and it works great on your example.

If you want to see how it's done, it's public.


I know this is an old topic, but I think this snippet might do a good job. I also know you're not supposed to use RegEx for HTML tags (as it does not address <script> and <style> at all), but this method might be what you need instead of getting the whole HTMLAgilityPack.... I used SqlString because this method is used by my SQL Server database. Can easily be switched to string. Also easy to change to StringBuilder to make it more optimal.

private static SqlString fnHTMLDecodeEncode(SqlString html, bool encode)
{
  if (html.IsNull)
    return SqlString.Null;

  const RegexOptions REGOPT = RegexOptions.Singleline | RegexOptions.Compiled;

  string s = html.Value;
  var m = Regex.Matches(s, @"(<[!A-Za-z\/][^>]*>", RegexOptions.Singleline |   RegexOptions.Compiled);
  int proStart, proLen;
  if (m.Count == 0)
  {
    proStart = 0;
    proLen = s.Length;
  }
  else
  {
    proStart = m[m.Count - 1].Index + m[m.Count - 1].Length;
    proLen = s.Length - proStart;
  }

  for (int i = m.Count; i >= 0; i--)
  {
    if (i < m.Count)
    {
        proStart = (i == 0 ? 0 : m[i - 1].Index + m[i - 1].Length);
        proLen = m[i].Index - proStart;
    }

    if (proLen > 2)
    {
        var orig = s.Substring(proStart, proLen);
        var enc = (encode ? System.Net.WebUtility.HtmlEncode(orig) : System.Net.WebUtility.HtmlDecode(orig));
        if (orig.Length != enc.Length)
        {
            s = s.Remove(proStart, proLen).Insert(proStart, enc);
        }

        proLen = -1;
    }

  }

  return new SqlString(s);
}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜