Serializing an object to a string: why is my encoding adding stupid characters?

2023-03-24 15:15 问答作者：

I need to get the serialized XML representation of an object as a string. I'm using the XmlSerializer and a memoryStream to do this.

XmlSerializer serializer = new XmlSerializer(typeof(MyClass));
using (MemoryStream stream = new MemoryStream())
{
  using (XmlTextWriter writer = new XmlTextWriter(stream,Encoding.UTF8))
  {
    serializer.Serialize(writer, myClass);
    string xml = Encoding.UTF8.GetString(stream.ToArray());
    //other chars may be added from the encoding.
    xml = xml.Substring(xml.IndexOf(Convert.ToChar(60)));
    xml = xml.Substring(0, (xml.LastIndexOf(Convert.ToChar(62)) + 1));
    return xml;
  }
}

Now just take note of the xml.substring lines for a moment. What I'm finding is that (even thought I'm specifying encoding on the XmlTextWriter and on the GetString (and I'm using memoryStream.ToArray(), so I'm operating only on the data in the stream's buffer)... the resulting xml string has some non-xml happy character added. In my case, a 开发者_C百科'?' at the start of the string. This is why I'm substring-ing for '<' and '>' to ensure I've only getting the good stuff.

Strange thing is, looking at this string in the debugger (Text Visualizer), I don't see this '?'. Only when I paste what's in the visualizer into notepad or similar.

So while the above code (substring etc) does the job, what's actually happening here? Is some unsigned byte thing being included and not being represented in the Text Visualizer?

You can exclude the BOM by specifying the encoding specifically - i.e. instead of Encoding.UTF8, try using:

using (MemoryStream stream = new MemoryStream())
{
  var enc = new UTF8Encoding(false);
  using (XmlTextWriter writer = new XmlTextWriter(stream,enc))
  {
    serializer.Serialize(writer, myClass);        
  }
  string xml = Encoding.UTF8.GetString(
      stream.GetBuffer(), 0, (int)stream.Length);
}

What you are looking at is a Byte Order Mark (BOM). It is normal in UTF8!

In short, for my comment fans: They are byte markers that determine the endianness of a string.

What you can do is either use a) ASCII as your encoding, which will drop the byte order marks .. or b) why not leave them in? They do serve a useful function after all for your xml string.

Marc Gravell, below, gives a third alternative by creating your own encoding object and specify false in the constructor to suppress byte order markers.

继续阅读：encoding xml-serialization

Serializing an object to a string: why is my encoding adding stupid characters?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？