开发者

How to encode 'á' to '&#225' with C# ?? (UTF8)

I'm trying to write an XML file with UTF-8 encode, and the original string can have invalid characters like 'á', so, i need to change these invalid characters to a valid ones.

I know that there is an encoding method that take, for example, character á and transform it to group of characters á.

I am trying to achive this with C#but i have no succes on it. I am using Encoding.UTF8 functions but i only end with the sema character (i.e: á) or a '?' character.

So, do you know with is the correct way to achive this character change with C# ??

Thanks for your开发者_StackOverflow time and help :)

LLORENS


You can use any one method.

Here are 4 ways you can encode XML in C#:

  1. string.Replace() 5 times

This is ugly but it works. Note that Replace("&", "&") has to be the first replace so we don't replace other already escaped &.

string xml = "<node>it's my \"node\" & i like it<node>";
encodedXml = xml.Replace("&","&amp;").Replace("<","&lt;").Replace(">","&gt;").Replace("\"", "&quot;").Replace("'", "&apos;");

// RESULT: &lt;node&gt;it&apos;s my &quot;node&quot; &amp; i like it&lt;node&gt;
  1. System.Web.HttpUtility.HtmlEncode()

Used for encoding HTML, but HTML is a form of XML so we can use that too. Mostly used in ASP.NET apps. Note that HtmlEncode does NOT encode apostrophes ( ' ).

string xml = "<node>it's my \"node\" & i like it<node>";
string encodedXml = HttpUtility.HtmlEncode(xml);

// RESULT: &lt;node&gt;it's my &quot;node&quot; &amp; i like it&lt;node&gt;
  1. System.Security.SecurityElement.Escape()

In Windows Forms or Console apps I use this method. If nothing else it saves me including the System.Web reference in my projects and it encodes all 5 chars.

string xml = "<node>it's my \"node\" & i like it<node>";
string encodedXml = System.Security.SecurityElement.Escape(xml);

// RESULT: &lt;node&gt;it&apos;s my &quot;node&quot; &amp; i like it&lt;node&gt;
  1. System.Xml.XmlTextWriter

Using XmlTextWriter you don't have to worry about escaping anything since it escapes the chars where needed. For example in the attributes it doesn't escape apostrophes, while in node values it doesn't escape apostrophes and qoutes.

string xml = "<node>it's my \"node\" & i like it<node>";
using (XmlTextWriter xtw = new XmlTextWriter(@"c:\xmlTest.xml", Encoding.Unicode))
{

    xtw.WriteStartElement("xmlEncodeTest");
    xtw.WriteAttributeString("testAttribute", xml);
    xtw.WriteString(xml);
    xtw.WriteEndElement();

}

// RESULT:
/*
<xmlEncodeTest testAttribute="&lt;node&gt;it's my &quot;node&quot; &amp; i like it&lt;node&gt;">
    &lt;node&gt;it's my "node" &amp; i like it&lt;node&gt;
</xmlEncodeTest>
*/

[http://weblogs.sqlteam.com/mladenp/archive/2008/10/21/Different-ways-how-to-escape-an-XML-string-in-C.aspx]


á is not an "invalid" character. It has a UTF-8 encoding (bytes 195 and 161), and Nick is right that if you construct everything correctly this will be transparent.


    private static string Escape(string content)
    {
        var sb = new StringBuilder();
        var settings = new XmlWriterSettings 
        { 
            ConformanceLevel = ConformanceLevel.Fragment 
        };

        using (var xmlWriter = XmlWriter.Create(sb, settings))
            xmlWriter.WriteString(content);

        return sb.ToString();
    }


This is exactly what you need: (found at http://www.codeproject.com/Articles/20255/Full-HTML-Character-Encoding-in-C)

//for example this transforms "čas" to "&#269;as"
public static string HtmlEncode(string text)
    {
        char[] chars = HttpUtility.HtmlEncode(text).ToCharArray();
        StringBuilder result = new StringBuilder(text.Length + (int)(text.Length * 0.1));

        foreach (char c in chars)
        {
            int value = Convert.ToInt32(c);
            if (value > 127)
                result.AppendFormat("&#{0};", value);
            else
                result.Append(c);
        }

        return result.ToString();
    }
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜