开发者

can't understand these xml encoding woes

The following hunk of code (snipped for brevity) generates an xml doc, and spits it out to a file. If I open the file in Visual Studio it appears to be in chinese characters. If I open it in Notepad it looks as expected. If I Console.WriteLine it look correct.

I know it's related to encoding, but I though I had all the encoding ducks in a row. What's missing?

StringBuilder stringBuilder = new StringBuilder();
XmlWriterSettings settings = new XmlWriterSettings();
settings.Encoding = Encoding.Unicode;
settings.Indent = true; 
settings.IndentChars = "\t";
using (XmlWriter textWriter = XmlWriter.Crea开发者_Go百科te(new StringWriter(stringBuilder), settings))
{
    textWriter.WriteStartElement("Submission");
    textWriter.WriteAttributeString("xmlns", "xsi", null, "http://www.w3.org/2001/XMLSchema-instance");
    textWriter.WriteEndElement();
}

using (StreamWriter sw = new StreamWriter(new FileStream(fileName, FileMode.Create, FileAccess.Write, FileShare.None)))
            {
                sw.Write(stringBuilder.ToString());
            }


The problem is that you're writing it to disk using UTF-8, but it will claim to be UTF-16 because that's what a StringWriter uses by default - and because you're explicitly setting it to use Encoding.Unicode as well.

The simplest way to fix this is to use a StringWriter which advertises itself as UTF-8:

public class Utf8StringWriter : StringWriter
{
    public override Encoding
    {
         get { return Encoding.UTF8; }
    }
}

... and then remove the settings.Encoding = Encoding.Unicode line. That way you'll use UTF-8 throughout. (In fact, the Encoding property of XmlWriterSettings is ignored when you create the XmlWriter with a TextWriter anyway.)

If you really want UTF-16, then when you create the StreamWriter, specify Encoding.Unicode there too.


I'm not sure what Encoding.Unicode is but I guess it's UTF-16 which writes two bytes per character into the file. For normal ASCII text, one of the bytes is always 0.

Try UTF-8 instead. This should look the same in any editor unless you use special characters (with a code point >= 128).

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜