Merge two XElements
I'm not quite sure how to ask this, or if this even exists, but I have a need to merge two XElements with one taking precendence over the other, to become just one element.
The preference here is VB.NET and Linq, but any language would be helpful if it demonstrates how to do this without me coding to manually pick apart and and resolve every single element and attribute.
For example, let's say I have two elements. Humor me on them being as different as they are.
1.
<HockeyPlayer height="6.0" hand="left">
<Position>Center</Position>
<Idol>Gordie Howe</Idol>
</HockeyPlayer>
2.开发者_Go百科
<HockeyPlayer height="5.9" startinglineup="yes">
<Idol confirmed="yes">Wayne Gretzky</Idol>
</HockeyPlayer>
The result of a merge would be
<HockeyPlayer height="6.0" hand="left" startinglineup="yes">
<Position>Center</Position>
<Idol confirmed="yes">Gordie Howe</Idol>
</HockeyPlayer>
Notice a few things: the height
attribute value of #1 overrode #2. The hand
attribute and value was simply copied over from #1 (it doesn't exist in #2). The startinglineup
attribute and value from #2 was copied over (it doesn't exist in #1). The Position
element in #1 was copied over (it doesn't exist in #2). The Idol
element value in #1 overrides #2, but #2's attribute of confirmed
(it doesn't exist in #1) is copied over.
Net net, #1 takes precendence over #2 where there is a conflict (meaning both have the same elements and/or attributes) and where there is no conflict, they both copy to the final result.
I've tried searching on this, but just can't seem to find anything, possibly because the words I'm using to search are too generic. Any thoughts or solutions (esp. for Linq)?
For the sake of others looking for the same thing, as I assume both the people contributing have long since lost interest... I needed to do something similar but a little more complete. Still not totally complete though, as the XMLDoc says it does not handle non-element content well, but I don't need to as my non-element content is either text or unimportant. Feel free to enhance and re-post... Oh and it's C# 4.0 as that's what I use...
/// <summary>
/// Provides facilities to merge 2 XElement or XML files.
/// <para>
/// Where the LHS holds an element with non-element content and the RHS holds
/// a tree, the LHS non-element content will be applied as text and the RHS
/// tree ignored.
/// </para>
/// <para>
/// This does not handle anything other than element and text nodes (infact
/// anything other than element is treated as text). Thus comments in the
/// source XML are likely to be lost.
/// </para>
/// <remarks>You can pass <see cref="XDocument.Root"/> if it you have XDocs
/// to work with:
/// <code>
/// XDocument mergedDoc = new XDocument(MergeElements(lhsDoc.Root, rhsDoc.Root);
/// </code></remarks>
/// </summary>
public class XmlMerging
{
/// <summary>
/// Produce an XML file that is made up of the unique data from both
/// the LHS file and the RHS file. Where there are duplicates the LHS will
/// be treated as master
/// </summary>
/// <param name="lhsPath">XML file to base the merge off. This will override
/// the RHS where there are clashes</param>
/// <param name="rhsPath">XML file to enrich the merge with</param>
/// <param name="resultPath">The fully qualified file name in which to
/// write the resulting merged XML</param>
/// <param name="options"> Specifies the options to apply when saving.
/// Default is <see cref="SaveOptions.OmitDuplicateNamespaces"/></param>
public static bool TryMergeXmlFiles(string lhsPath, string rhsPath,
string resultPath, SaveOptions options = SaveOptions.OmitDuplicateNamespaces)
{
try
{
MergeXmlFiles(lhsPath, rhsPath, resultPath);
}
catch (Exception)
{
// could integrate your logging here
return false;
}
return true;
}
/// <summary>
/// Produce an XML file that is made up of the unique data from both the LHS
/// file and the RHS file. Where there are duplicates the LHS will be treated
/// as master
/// </summary>
/// <param name="lhsPath">XML file to base the merge off. This will override
/// the RHS where there are clashes</param>
/// <param name="rhsPath">XML file to enrich the merge with</param>
/// <param name="resultPath">The fully qualified file name in which to write
/// the resulting merged XML</param>
/// <param name="options"> Specifies the options to apply when saving.
/// Default is <see cref="SaveOptions.OmitDuplicateNamespaces"/></param>
public static void MergeXmlFiles(string lhsPath, string rhsPath,
string resultPath, SaveOptions options = SaveOptions.OmitDuplicateNamespaces)
{
XElement result =
MergeElements(XElement.Load(lhsPath), XElement.Load(rhsPath));
result.Save(resultPath, options);
}
/// <summary>
/// Produce a resulting <see cref="XElement"/> that is made up of the unique
/// data from both the LHS element and the RHS element. Where there are
/// duplicates the LHS will be treated as master
/// </summary>
/// <param name="lhs">XML Element tree to base the merge off. This will
/// override the RHS where there are clashes</param>
/// <param name="rhs">XML element tree to enrich the merge with</param>
/// <returns>A merge of the left hand side and right hand side element
/// trees treating the LHS as master in conflicts</returns>
public static XElement MergeElements(XElement lhs, XElement rhs)
{
// if either of the sides of the merge are empty then return the other...
// if they both are then we return null
if (rhs == null) return lhs;
if (lhs == null) return rhs;
// Otherwise build a new result based on the root of the lhs (again lhs
// is taken as master)
XElement result = new XElement(lhs.Name);
MergeAttributes(result, lhs.Attributes(), rhs.Attributes());
// now add the lhs child elements merged to the RHS elements if there are any
MergeSubElements(result, lhs, rhs);
return result;
}
/// <summary>
/// Enrich the passed in <see cref="XElement"/> with the contents of both
/// attribute collections.
/// Again where the RHS conflicts with the LHS, the LHS is deemed the master
/// </summary>
/// <param name="elementToUpdate">The element to take the merged attribute
/// collection</param>
/// <param name="lhs">The master set of attributes</param>
/// <param name="rhs">The attributes to enrich the merge</param>
private static void MergeAttributes(XElement elementToUpdate,
IEnumerable<XAttribute> lhs, IEnumerable<XAttribute> rhs)
{
// Add in the attribs of the lhs... we will only add new attribs from
// the rhs duplicates will be ignored as lhs is master
elementToUpdate.Add(lhs);
// collapse the element names to save multiple evaluations... also why
// we ain't putting this in as a sub-query
List<XName> lhsAttributeNames =
lhs.Select(attribute => attribute.Name).ToList();
// so add in any missing attributes
elementToUpdate.Add(rhs.Where(attribute =>
!lhsAttributeNames.Contains(attribute.Name)));
}
/// <summary>
/// Enrich the passed in <see cref="XElement"/> with the contents of both
/// <see cref="XElement.Elements()"/> subtrees.
/// Again where the RHS conflicts with the LHS, the LHS is deemed the master.
/// Where the passed elements do not have element subtrees, but do have text
/// content that will be used. Again the LHS will dominate
/// </summary>
/// <remarks>Where the LHS has text content and no subtree, but the RHS has
/// a subtree; the LHS text content will be used and the RHS tree ignored.
/// This may be unexpected but is consistent with other .NET XML
/// operations</remarks>
/// <param name="elementToUpdate">The element to take the merged element
/// collection</param>
/// <param name="lhs">The element from which to extract the master
/// subtree</param>
/// <param name="rhs">The element from which to extract the subtree to
/// enrich the merge</param>
private static void MergeSubElements(XElement elementToUpdate,
XElement lhs, XElement rhs)
{
// see below for the special case where there are no children on the LHS
if (lhs.Elements().Count() > 0)
{
// collapse the element names to a list to save multiple evaluations...
// also why we ain't putting this in as a sub-query later
List<XName> lhsElementNames =
lhs.Elements().Select(element => element.Name).ToList();
// Add in the elements of the lhs and merge in any elements of the
//same name on the RHS
elementToUpdate.Add(
lhs.Elements().Select(
lhsElement =>
MergeElements(lhsElement, rhs.Element(lhsElement.Name))));
// so add in any missing elements from the rhs
elementToUpdate.Add(rhs.Elements().Where(element =>
!lhsElementNames.Contains(element.Name)));
}
else
{
// special case for elements where they have no element children
// but still have content:
// use the lhs text value if it is there
if (!string.IsNullOrEmpty(lhs.Value))
{
elementToUpdate.Value = lhs.Value;
}
// if it isn't then see if we have any children on the right
else if (rhs.Elements().Count() > 0)
{
// we do so shove them in the result unaltered
elementToUpdate.Add(rhs.Elements());
}
else
{
// nope then use the text value (doen't matter if it is empty
//as we have nothing better elsewhere)
elementToUpdate.Value = rhs.Value;
}
}
}
}
Here's a console app that produces the result listed in your question. It uses recursion to process each sub element. The one thing it doesn't check for is child elements that appear in Elem2
that aren't in Elem1
, but hopefully this will get you started towards a solution.
I'm not sure if I would say this is the best possible solution, but it does work.
Module Module1
Function MergeElements(ByVal Elem1 As XElement, ByVal Elem2 As XElement) As XElement
If Elem2 Is Nothing Then
Return Elem1
End If
Dim result = New XElement(Elem1.Name)
For Each attr In Elem1.Attributes
result.Add(attr)
Next
Dim Elem1AttributeNames = From attr In Elem1.Attributes _
Select attr.Name
For Each attr In Elem2.Attributes
If Not Elem1AttributeNames.Contains(attr.Name) Then
result.Add(attr)
End If
Next
If Elem1.Elements().Count > 0 Then
For Each elem In Elem1.Elements
result.Add(MergeElements(elem, Elem2.Element(elem.Name)))
Next
Else
result.Value = Elem1.Value
End If
Return result
End Function
Sub Main()
Dim Elem1 = <HockeyPlayer height="6.0" hand="left">
<Position>Center</Position>
<Idol>Gordie Howe</Idol>
</HockeyPlayer>
Dim Elem2 = <HockeyPlayer height="5.9" startinglineup="yes">
<Idol confirmed="yes">Wayne Gretzky</Idol>
</HockeyPlayer>
Console.WriteLine(MergeElements(Elem1, Elem2))
Console.ReadLine()
End Sub
End Module
Edit: I just noticed that the function was missing As XElement
. I'm actually surprised that it worked without that! I work with VB.NET every day, but it has some quirks that I still don't totally understand.
精彩评论