开发者

XDocument Text Node New Line

I'm trying to get a newline into a text node using XText from the Linq XML namespace.

I have a string which contains newline characters however I need to work out how to convert these to entity characters (i.e. 
) rather than just having the开发者_高级运维m appear in the XML as new lines.

XElement element = new XElement( "NodeName" );
...

string example = "This is a string\nWith new lines in it\n";

element.Add( new XText( example ) );

The XElement is then written out using an XmlTextWriter which results in the file containing the newline rather than an entity replacement.

Has anyone come across this problem and found a solution?


EDIT:

The problem manifests itself when I load the XML into EXCEL which doesn't seem to like the newline character but which accepts the entity replacement. The result is that newlines aren't showing in EXCEL unless I replace them with 


Nick.


Cheating:

        XmlWriterSettings settings = new XmlWriterSettings();
        settings.Indent = true;
        settings.CheckCharacters = false;
        settings.NewLineChars = "
";
        XmlWriter writer = XmlWriter.Create(..., settings);
        element.WriteTo(writer);
        writer.Flush();

UPDATE:

Complete program

using System;
using System.Xml;
using System.Xml.Linq;


namespace ConsoleApplication1
{
class Program
{
    static void Main(string[] args)
    {
        XElement element = new XElement( "NodeName" );
        string example = "This is a string\nWith new lines in it\n";
        element.Add( new XText( example ) );

        XmlWriterSettings settings = new XmlWriterSettings();
        settings.Indent = true;
        settings.CheckCharacters = false;
        settings.NewLineChars = "
";
        XmlWriter writer = XmlWriter.Create(Console.Out, settings);
        element.WriteTo(writer);
        writer.Flush();
    }
}
}

OUTPUT:

C:\Users\...\\ConsoleApplication1\bin\Release>ConsoleApplication1.exe
<?xml version="1.0" encoding="ibm850"?>&#10;<NodeName>This is a string&#10;With new lines in it&#10;</NodeName>


To any standard XML parser there is no difference between the entity &#10; and a new line character, as they are one and the same thing.

To illustrate this the following code shows that they are the same thing:

string s1 = "<root>Test&#10;Test2</root>";
string s2 = "<root>Test\nTest2</root>";

XDocument doc1 = XDocument.Parse(s1);
XDocument doc2 = XDocument.Parse(s2);

Console.WriteLine(doc1.ToString());
Console.WriteLine(doc2.ToString());


It's the XmlTextWriter which is responsible for outputting escaped entities. So if you do this, for example:

        using (XmlTextWriter w = new XmlTextWriter("test.xml", Encoding.UTf8))
        {
            w.WriteString("&#x10;");
        }

You will also get an escaped ampersand output in text.xml &amp;#x10, which you don't want. You would like to keep the &#x10; sequence raw, as is.

The solution I propose is to create a new StreamWriter implementation capable of detecting an escaped string like "&amp;#x10;":

    // A StreamWriter that does not escape &#10; characters
    public class NonXmlEscapingStreamWriter : StreamWriter
    {
        private const string AmpToken = "amp";
        private int _bufferState = 0; // used to keep state

        // add other ctors overloads if needed
        public NonXmlEscapingStreamWriter(string path)
            : base(path)
        {
        }

        // NOTE this code is based on the assumption that StreamWriter
        // only overrides these 4 Write functions, which is true today but could change in the future
        // and also on the assumption that the XmlTextWrite writes escaped values in a specific WriteXX calls sequence
        public override void Write(char value)
        {
            if (value == '&')
            {
                if (_bufferState == 0)
                {
                    _bufferState++;
                    return; // hold it
                }
                else
                {
                    _bufferState = 0;
                }
            }
            else if (value == ';')
            {
                if (_bufferState > 1)
                {
                    _bufferState++;
                    return;
                }
                else
                {
                    Write('&'); // release what's been held
                    Write(AmpToken);
                    _bufferState = 0;
                }
            }
            else if (value == '\n') // detect non escaped \n
            {
                base.Write("&#10;");
                return;
            }
            base.Write(value);
        }

        public override void Write(string value)
        {
            if (_bufferState > 0)
            {
                if (value == AmpToken)
                {
                    _bufferState++;
                    return; // hold it
                }
                else
                {
                    Write('&'); // release what's been held
                    _bufferState = 0;
                }
            }
            base.Write(value);
        }

        public override void Write(char[] buffer, int index, int count)
        {
            if (_bufferState > 2)
            {
                _bufferState = 0;
                base.Write('&'); // release this anyway
                string replace;
                if ((buffer != null) && ((replace = GetReplaceLength(buffer, index, count)) != null))
                {
                    base.Write(replace);
                    base.Write(buffer, index + replace.Length, count - replace.Length);
                    return;
                }
                else
                {
                    base.Write(AmpToken); // release this
                    base.Write(';'); // release this
                }
            }
            base.Write(buffer, index, count);
        }

        public override void Write(char[] buffer)
        {
            Write(buffer, 0, buffer != null ? buffer.Length : 0);
        }

        private string GetReplaceLength(char[] buffer, int index, int count)
        {
            // this is specific to the 10 character but could be adapted
            const string token = "#10;";
            if ((index + count) < token.Length)
                return null;

            // we test the char array to avoid string allocations
            for(int i = 0; i < token.Length; i++)
            {
                if (buffer[index + i] != token[i])
                    return null;
            }
            return token;
        }
    }

And you can use it like this:

    using (XmlTextWriter w = new XmlTextWriter(new NonXmlEscapingStreamWriter("test.xml")))
    {
        element.WriteTo(w);
    }

NOTE: Although it is capable of detecting lonely \n sequences, I suggest you ensure all \n are actually escaped in your original text, so, you need to replace \n by &#x10; before you actually output xml, like this:

string example = "This is a string&#x10;With new lines in it&#x10;";
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜