xmlreader newline \n instead of \r\n
When I use XmlReader.ReadOuterXml(), elements are separated by \n instead of \r\n. So, for example, if I have XmlDocument representatino of
<A>
<B>
</B>
</A>
I get
<A>\n<B>\n</B>\n</A>
Is there an option to specify newline character? XmlWriterSettings has it but XmlReader doesn't seem to have this.
Here is my code to read xml. Note that XmlWriterSettings by default has NewLineHandling = Replace
XmlDocument xmlDocument = <Generate some XmlDocument>
XmlWriterSettings settings = new XmlWriterSettings();
settings.Indent = true;
// Use a memory stream because it accepts UTF8 characters. If we use a
// string builder the XML will be UTF16.
using (MemoryStream memStream = new MemoryStream())
{
using (XmlWriter xmlWriter = XmlWriter.Create(memStream, settings))
{
xmlDocument.Save(xmlWriter);
}
//Set the pointer back to the beginning of the stream to be read
memStream.Position = 0;
using (XmlReader reader = XmlReader.Create(memStream))
{
read开发者_开发问答er.Read();
string header = reader.Value;
reader.MoveToContent();
return "<?xml " + header + " ?>" + Environment.NewLine + reader.ReadOuterXml();
}
}
XmlReader
will automatically normalize \r\n\
to \n
. Although this seems unusual on Windows, it is actually required by the XML Specification (http://www.w3.org/TR/2008/REC-xml-20081126/#sec-line-ends).
You can do a String.Replace
:
string s = reader.ReadOuterXml().Replace("\n", "\r\n");
I had to write database data to an xml file and read it back from the xml file, using LINQ to XML. Some fields in a record were themselves xml strings complete with \r characters. These had to remain intact. I spent days trying to find something that would work, but it seems Microsoft was by design converting \r to \n.
The following solution works for me:
To write a loaded XDocument to the XML file keeping \r intact, where xDoc is an XDocument and filePath is a string:
XmlWriterSettings xmlWriterSettings = new XmlWriterSettings
{ NewLineHandling = NewLineHandling.None, Indent = true };
using (XmlWriter xmlWriter = XmlWriter.Create(filePath, xmlWriterSettings))
{
xDoc.Save(xmlWriter);
xmlWriter.Flush();
}
To read an XML file into an XElement keeping \r intact:
using (XmlTextReader xmlTextReader = new XmlTextReader(filePath)
{ WhitespaceHandling = WhitespaceHandling.Significant })
{
xmlTextReader.MoveToContent();
xDatabaseElement = XElement.Load(xmlTextReader);
}
Solution 1: Write entitized XML
Use a well configured XmlWriter
with NewLineHandling.Entitize
option so the XmlReader
will not eliminate normalize the line endings.
You can use such a custom XmlWriter
even with XDocument
:
xDoc.Save(XmlWriter.Create(fileName, new XmlWriterSettings { NewLineHandling = NewLineHandling.Entitize }));
Solution 2: Read non-entitized XML without normalization
Solution 1 is the cleaner way; however, it is possible that you already have the non-entitized XML and you cannot modify the creation and still you want to prevent normalization. The accepted answer suggests a replace but that replaces every \n occurrences blindly even if it is not desirable. To retrieve all of the line endings as they are in the file you can try to use the legacy XmlTextReader
class, which does not normalize XML files by default. You can use it with XDocument
, too:
var xDoc = XDocument.Load(new XmlTextReader(fileName));
There's a quicker way if you're just trying to get to UTF-8. First create a writer:
public class EncodedStringWriter : StringWriter
{
public EncodedStringWriter(StringBuilder sb, Encoding encoding)
: base(sb)
{
_encoding = encoding;
}
private Encoding _encoding;
public override Encoding Encoding
{
get
{
return _encoding;
}
}
}
Then use it:
XmlDocument doc = new XmlDocument();
doc.LoadXml("<foo><bar /></foo>");
StringBuilder sb = new StringBuilder();
XmlWriterSettings xws = new XmlWriterSettings();
xws.Indent = true;
using( EncodedStringWriter w = new EncodedStringWriter(sb, Encoding.UTF8) )
{
using( XmlWriter writer = XmlWriter.Create(w, xws) )
{
doc.WriteTo(writer);
}
}
string xml = sb.ToString();
Gotta give credit where credit is due.
XmlReader reads files, not writes them. If you are getting \n in your reader it is because that's what's in the file. Both \n and \r are whitespace and are semantically the same in XML, it will not affect the meaning or content of the data.
Edit:
That looks like C#, not Ruby. As binarycoder says, ReadOuterXml is defined to return normalized XML. Typically this is what you want. If you want the raw XML you should use Encoding.UTF8.GetString(memStream.ToArray())
, not XmlReader
.
精彩评论