can't understand these xml encoding woes
The following hunk of code (snipped for brevity) generates an xml doc, and spits it out to a file. If I open the file in Visual Studio it appears to be in chinese characters. If I open it in Notepad it looks as expected. If I Console.WriteLine it look correct.
I know it's related to encoding, but I though I had all the encoding ducks in a row. What's missing?
StringBuilder stringBuilder = new StringBuilder();
XmlWriterSettings settings = new XmlWriterSettings();
settings.Encoding = Encoding.Unicode;
settings.Indent = true;
settings.IndentChars = "\t";
using (XmlWriter textWriter = XmlWriter.Crea开发者_Go百科te(new StringWriter(stringBuilder), settings))
{
textWriter.WriteStartElement("Submission");
textWriter.WriteAttributeString("xmlns", "xsi", null, "http://www.w3.org/2001/XMLSchema-instance");
textWriter.WriteEndElement();
}
using (StreamWriter sw = new StreamWriter(new FileStream(fileName, FileMode.Create, FileAccess.Write, FileShare.None)))
{
sw.Write(stringBuilder.ToString());
}
The problem is that you're writing it to disk using UTF-8, but it will claim to be UTF-16 because that's what a StringWriter
uses by default - and because you're explicitly setting it to use Encoding.Unicode
as well.
The simplest way to fix this is to use a StringWriter which advertises itself as UTF-8:
public class Utf8StringWriter : StringWriter
{
public override Encoding
{
get { return Encoding.UTF8; }
}
}
... and then remove the settings.Encoding = Encoding.Unicode
line. That way you'll use UTF-8 throughout. (In fact, the Encoding
property of XmlWriterSettings
is ignored when you create the XmlWriter
with a TextWriter
anyway.)
If you really want UTF-16, then when you create the StreamWriter
, specify Encoding.Unicode
there too.
I'm not sure what Encoding.Unicode
is but I guess it's UTF-16 which writes two bytes per character into the file. For normal ASCII text, one of the bytes is always 0.
Try UTF-8
instead. This should look the same in any editor unless you use special characters (with a code point >= 128).
精彩评论