开发者

XDocument comparrison

I have two documents with a simple schema that I need to compare:

current doc:

<Sections>
  <Section Number="1"/>
  &开发者_如何学编程lt;Section Number="2"/>
  <Section Number="4"/>
  <Section Number="5"/>
</Sections>

previous doc:

<Sections>
  <Section Number="1"/>
  <Section Number="2"/>
</Sections>

The result of the comparison will be a list sections that have been added to the current doc...ie sections in the current doc that are not in the previous doc. In this example section 4 and 5 are new.

The current and previous doc can have upwards of 20,000 records. The following approach produces the results I need but seems like the wrong approach as it passes over the data sets multiple times, and takes a while to run.

get a list of the sections:

List<XElement> currenList = currentDoc.Descendants("Section").ToList();

get attributes in previous list

List<string> previousString = //get the attribute values...
//get the new sections...
var newSections = (from nodes in currentList
                   let att = nodes.Attribute("Number").Value
                   where !previousList.Contains(att)
                   select nodes) 

What is a better approach that would involve fewer passes/conversions of the datasets?


You should look at Except:

IEnumerable<int> currentList = currentDoc.Descendants("Section")
                      .Select(e => (int)e.Attribute("Number"));
IEnumerable<int> previousList = previousDoc.Descendants("Section")
                      .Select(e => (int)e.Attribute("Number"));

IEnumerable<int> newSections = currentList.Except(previousList);


You can use a sorted set to keep track.

SortedSet<string> sections = new Sort...
List<XElement> diff = new Li...

foreach (var node in previousList)
    sections.Add(node.Attribute("Section").Value());

foreach (var node in currentList)
    if (!sections.Contains(node.Attribute("Section").Value());
        diff.Add(node);

this uses a little extra memory with SortedSet, but it should run n*log(n) since the Contains() on a sorted set will run log(n).

in the end, diff should contain the list you're looking for.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜