开发者

Ideal Data Structure to Deal with XML Data

Maybe a silly question, but I usually learn a lot with those. :)

I'm working on a software that deals a lot with XML, both as input and as output, and in between a lot of processing occurs.

My first thought was to use internally a dict as a internal data structure, and from there, work my way with a process of reading and writing it.

What yo开发者_运维知识库u guys think? Any better approach, python-wise?


An XML document in general is a tree with lots of bells and whistles (attributes vs. child nodes, mixing of text with child nodes, entities, xml declarations, comments, and many more). Handling that should be left to existing, mature libraries - for Python, it's commonly agreed that lxml is the most convenient choice, followed by the stdlib ElementTree modules (by which one lxml module, lxml.etree, is so heavily inspired that incompabilities are exceptions).

These handle all that complexity and expose it in somewhat handable ways with many convenience methods (lxml's XPath support has saved me lot of code). After parsing, programs can - of course - go on to convert the trees into simpler data structures that fit the data actually modeled much better. What data structures exactly are possible and sensible depending on what you want to represent (if you misuse XML as a flat key-value storage, for instance, you could indeed go on to convert the tree into a dictionary).


It depends completely on what type of data you have in XML, what kinds of processing you'll need to do with it, what sort of output you'll need to produce from it, etc.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜