Parsing xml string to an xml document fails if the string begins with <?xml... ?> section

2022-12-17 12:47 问答作者：

I have an XML file begining like this:

<?xml version="1.0" encoding="utf-8"?>
<Report xmlns:rd="http://schemas.microsoft.com/SQLServer/reporting/reportdesign开发者_开发问答er" xmlns="http://schemas.microsoft.com/sqlserver/reporting/2008/01/reportdefinition">
  <DataSources>

When I run following code:

byte[] fileContent = //gets bytes
            string stringContent = Encoding.UTF8.GetString(fileContent);
            XDocument xml = XDocument.Parse(stringContent);

I get following XmlException:

Data at the root level is invalid. Line 1, position 1.

Cutting out the version and encoding node fixes the problem. Why? How to process this xml correctly?

My first thought was that the encoding is Unicode when parsing XML from a .NET string type. It seems, though that XDocument's parsing is quite forgiving with respect to this.

The problem is actually related to the UTF8 preamble/byte order mark (BOM), which is a three-byte signature optionally present at the start of a UTF-8 stream. These three bytes are a hint as to the encoding being used in the stream.

You can determine the preamble of an encoding by calling the GetPreamble method on an instance of the System.Text.Encoding class. For example:

// returns { 0xEF, 0xBB, 0xBF }
byte[] preamble = Encoding.UTF8.GetPreamble();

The preamble should be handled correctly by XmlTextReader, so simply load your XDocument from an XmlTextReader:

XDocument xml;
using (var xmlStream = new MemoryStream(fileContent))
using (var xmlReader = new XmlTextReader(xmlStream))
{
    xml = XDocument.Load(xmlReader);
}

If you only have bytes you could either load the bytes into a stream:

XmlDocument oXML;

using (MemoryStream oStream = new MemoryStream(oBytes))
{
  oXML = new XmlDocument();
  oXML.Load(oStream);
}

Or you could convert the bytes into a string (presuming that you know the encoding) before loading the XML:

string sXml;
XmlDocument oXml;

sXml = Encoding.UTF8.GetString(oBytes);
oXml = new XmlDocument();
oXml.LoadXml(sXml);

I've shown my example as .NET 2.0 compatible, if you're using .NET 3.5 you can use XDocument instead of XmlDocument.

Load the bytes into a stream:

XDocument oXML;

using (MemoryStream oStream = new MemoryStream(oBytes))
using (XmlTextReader oReader = new XmlTextReader(oStream))
{
  oXML = XDocument.Load(oReader);
}

Convert the bytes into a string:

string sXml;
XDocument oXml;

sXml = Encoding.UTF8.GetString(oBytes);
oXml = XDocument.Parse(sXml);

Do you have a byte-order-mark (BOM) at the beginning of your XML, and does it match your encoding ? If you chop out your header, you'll also chop out the BOM and if that is incorrect, then subsequent parsing may work.

You may need to inspect your document at the byte level to see the BOM.

Why bothering to read the file as a byte sequence and then converting it to string while it is an xml file? Just leave the framework do the loading for you and cope with the encodings:

var xml = XDocument.Load("test.xml");

Try this:

int startIndex = xmlString.IndexOf('<');
if (startIndex > 0)
{
    xmlString = xmlString.Remove(0, startIndex);
}

I have also encountered this error because the source XML was a string that somehow got some non-printable characters that seemed to break XmlDocument or XDocument parsing. Stripping them fixed the issue:

string sanitized = Regex.Replace(part, @"\p{C}+", string.Empty);

Credit: C# regex to remove non - printable characters, and control characters, in a text that has a mix of many different languages, unicode letters

继续阅读：.net xml

Parsing xml string to an xml document fails if the string begins with <?xml... ?> section

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？