Performance: XmlReader or LINQ to XML
I have a 150 MB XML file which is used as DB in my project. Currently I'm usi开发者_运维技巧ng XmlReader
to read content from it. I want to know if it is better to use XmlReader
or LINQ to XML for this scenario.
Note that I'm searching for an item in this XML and display search result, so it can take a long time or just a moment.
If you want performance use XMLReader. It doesn't read the whole file and build the DOM tree in memory. It instead, reads the file from disk and gives you back each node it finds on the way.
With a quick google search I found a performance comparison of XmlReader, LINQ to XML and XmlDocument.Load.
https://web.archive.org/web/20130517114458/http://www.nearinfinity.com/blogs/joe_ferner/performance_linq_to_sql_vs.html
I would personally look at using Linq to Xml utilizing the streaming techniques outlined in the Microsoft help file: http://msdn.microsoft.com/en-us/library/system.xml.linq.xstreamingelement.aspx#Y1392
Here's a quick benchmark test reading from a 200mb xml file with a simple filter:
var xmlFilename = "test.xml";
//create test xml file
var initMemoryUsage = GC.GetTotalMemory(true);
var timer = System.Diagnostics.Stopwatch.StartNew();
var rand = new Random();
var testDoc = new XStreamingElement("root", //in order to stream xml output XStreamingElement needs to be used for all parent elements of collection so no XDocument
Enumerable.Range(1, 10000000).Select(idx => new XElement("child", new XAttribute("id", rand.Next(0, 1000))))
);
testDoc.Save(xmlFilename);
var outStat = String.Format("{0:f2} sec {1:n0} kb //linq to xml ouput streamed", timer.Elapsed.TotalSeconds, (GC.GetTotalMemory(false) - initMemoryUsage) / 1024);
//linq to xml not streamed
initMemoryUsage = GC.GetTotalMemory(true);
timer.Restart();
var col1 = XDocument.Load(xmlFilename).Root.Elements("child").Where(e => (int)e.Attribute("id") < 10).Select(e => (int)e.Attribute("id")).ToArray();
var stat1 = String.Format("{0:f2} sec {1:n0} kb //linq to xml input not streamed", timer.Elapsed.TotalSeconds, (GC.GetTotalMemory(false) - initMemoryUsage) / 1024);
//xmlreader
initMemoryUsage = GC.GetTotalMemory(true);
timer.Restart();
var col2 = new List<int>();
using (var reader = new XmlTextReader(xmlFilename))
{
while (reader.ReadToFollowing("child"))
{
reader.MoveToAttribute("id");
int value = Convert.ToInt32(reader.Value);
if (value < 10)
res2.Add(value);
}
}
var stat2 = String.Format("{0:f2} sec {1:n0} kb //xmlreader", timer.Elapsed.TotalSeconds, (GC.GetTotalMemory(false) - initMemoryUsage) / 1024);
//linq to xml streamed
initMemoryUsage = GC.GetTotalMemory(true);
timer.Restart();
var col3 = StreamElements(xmlFilename, "child").Where(e => (int)e.Attribute("id") < 10).Select(e => (int)e.Attribute("id")).ToArray();
var stat3 = String.Format("{0:f2} sec {1:n0} kb //linq to xml input streamed", timer.Elapsed.TotalSeconds, (GC.GetTotalMemory(false) - initMemoryUsage) / 1024);
//util method
public static IEnumerable<XElement> StreamElements(string filename, string elementName)
{
using (var reader = XmlTextReader.Create(filename))
{
while (reader.Name == elementName || reader.ReadToFollowing(elementName))
yield return (XElement)XElement.ReadFrom(reader);
}
}
And here's the processing time and memory usage on my machine:
11.49 sec 225 kb // linq to xml ouput streamed
17.36 sec 782,312 kb // linq to xml input not streamed
6.52 sec 1,825 kb // xmlreader
11.74 sec 2,238 kb // linq to xml input streamed
Write a few benchmark tests to establish exactly what the situation is for you, and take it from there... Linq2XML introduces a lot of flexibility...
精彩评论