开发者

Why does this XML file load slowly?

I have some very simple code:

        XmlDocument doc = new XmlDocument();
        Console.WriteLine("loading");
        doc.Load(url);
        Console.WriteLine("loaded");

        XmlNodeList nodeList = doc.GetElementsByTagName("p");

        foreach(XmlNode node in nodeList)
        {
            Console.WriteLine(node.ChildNodes[0].Value);
        }
        return source;

I'm working on this file and it takes two minutes to l开发者_JAVA百科oad. Why does it take so long? I tried both with fetching and file from the net and loading a local file.


I imagine it's the DTD of the page that's taking so long to load. Given that it defines entities, you shouldn't disable it, so you're probably better off not going down this path.

Given the inner workings of the wikipedia parser (a right mess), I'd say it's a big leap to assume it's going to produce well-formed XHTML every time.

Use HTML Agility Pack to parse (then you can convert to XmlDocument a little more easily if required, IIRC).

If you really want to go down the XmlDocument route you can keep a local cache of the HTML DTDs. See this post, this post and this post for details.


It is becuase XmlDocument doesn't just load your Xml into a nice class heirarchy it also goes and fetches all of the namespace DTD's defined in the document. Run fiddler and you will see the calls to fetch

http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd
http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent
http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent
http://www.w3.org/TR/xhtml1/DTD/xhtml-special.ent

These all took me about 20 seconds to fetch.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜