Merge two XML files, one of which is non-conformant, in C#
I have two XML files which need to be merged into one file. When I try to merge them, I get an error saying that one of them does not conform.
The offending XML file looks something like:
<letter>
<to>
<participant>
<name>Joe Bethersonton</name>
<PostalAddress>Apartment 23R, 11454 Pruter Street</PostalAddress>
<Town>Fargo, North Dakota, USA</Town>
<ZipCode>50504</ZipCode>
</participant>
</to>
<from>
<participant>
<name>Jon Doe</name>
<PostalAddress>52 Generic Street</PostalAddress>
<Town>Romford, Essex, UK</Town>
<ZipCode>RM11 2TH</ZipCode>
</participant>
</from>
</letter>
I am trying to merge the two files using the following code snippet:
try
{
Dataset ds = new DataSet();
Dataset ds2 = new DataSet();
XmlTextReader reader1 = new XmlTextReader("C:\\Fi开发者_运维问答le1.xml");
XmlTextReader reader2 = new XmlTextReader("C:\\File2.xml");
ds.ReadXml(reader1);
ds2.ReadXml(reader2);
ds.Merge(ds2);
}
catch(System.Exception ex)
{
Console.WriteLine(ex.Message);
}
This gives the following error:
The same table 'participant' cannot be the child table in two nested relations.
The two XML files are both encoded in UTF-16, which makes combining them by a simple text read and write difficult.
My required end result is one XML file with the contents of the first XML file followed by the contents of the second XML file, with a and tag around the whole lot and a header at the top.
Any ideas?
Thanks, Rik
In my opinion, the XML you provided is just fine. I suggest, you use the following code and don't use the Dataset class at all:
XDocument doc1 = XDocument.Load("C:\\File1.xml");
XDocument doc2 = XDocument.Load("C:\\File2.xml");
var result = new XDocument(new XElement("Root", doc1.Root, doc2.Root));
result
will contain a XML document with "Root" as the root tag and then the content of file 1 followed by the content of file 2.
Update:
If you need to use XmlDocument
, you can use this code:
XmlDocument doc1 = new XmlDocument();
XmlDocument doc2 = new XmlDocument();
doc1.Load("C:\\File1.xml");
doc2.Load("C:\\File2.xml");
XmlDocument result = new XmlDocument();
result.AppendChild(result.CreateElement("Root"));
result.DocumentElement.AppendChild(result.ImportNode(doc1.DocumentElement, true));
result.DocumentElement.AppendChild(result.ImportNode(doc2.DocumentElement, true));
I suspect the solution is to provide a schema. DataSet.Merge
doesn't know what to do with two sets of elements with the same name. It attempts to infer a schema, but that doesn't work out so well here.
According to this thread on MSDN, this is a limitation of the DataSet
class:
The DataSet class in .NET 2.0 (Visual Studio 2005) still has the limitation of not supporting different nested tables with the same name. Therefore you will have to introduce an XML transform to pre-process the XML (and schemas) before you load them up into the DataSet.
Of course, the way that's phrased makes it seem like a newer version might have fixed this. Unfortunately, that may not be the case, as the original answer was posted back in 2005.
This knowledge base article seems to indicate that this behavior is "by design", albeit in a slightly different context.
A better explanation of why this behavior is occurring is also given on this thread:
When ADO reads XML into a DataSet, it creates DataTables to contain each type of element it encounters. Each table is uniquely identified by its name. You can't have two different tables named "PayList".
Also, a given table can have any number of parent tables, but only one of its parent relations can be nested - otherwise, a given record would get written to the XML multiple times, as a child of each of its parent rows.
It's extremely convenient that the DataSet's ReadXml method can infer the schema of the DataSet as it reads its input, but the XML has to conform to certain constraints if it's going to be readable. The XML you've got doesn't. So you have two alternatives: you can change the XML, or you can write your own method to populate the DataSet.
If it were me, I'd write an XSLT transform that took the input XML and turned PayList elements into either MatrixPayList or NonMatrixPaylist elements. Then I'd pass its output to the DataSet.
Using XmlDocument
or XDocument
to read in and manipulate the XML files is another possible workaround. For an example, see Merging two xml files LINQ
I found a solution using Serialization to first infer the schema, then serialize the schema and remove the relationships contraints (this tricks the DataSet into thinking that IT has created the dataset.), then load this new schema into a DataSet.
This new dataset will be able to load both your xml files. More details behind this trick: Serialization Issue when using WriteXML method
精彩评论