What is the modern way to to do XML transforms in Python?
I need to do bi-direction transforms between some significantly complex XML and flat-file formats in Python. I'm out of date and don't know how people are solving this problem in the far off future year of 2011.
I've got back up to date with the various Python XML libraries, but it's been 8 years since my last time in XSLT hell and I was amazed after googling around that that's still common.
So how do you go about doing complex XML data transforms?
I'd like to do this in Python because the documents are not direct mappings and there is some processing and calculation required. But I'd still like to pass as much off to a rule engine as possible.
Edit: To be clear I'm interested in techniques more so than specific libraries or tools, but please to post those too. I'm trying hard to avoid the word pattern here, but surely 开发者_如何学Gothis is a common problem.
Edit 2: I still don't think there were any good answers on general techniques, but the original problem I had was solved using the Bots EDI framework for doing document translations. It is quite heavily focussed on EDI but can be used for generic translation. It was a heavy-weight solution though.
For python, here is a comprehensive list of available XML libs/modules:
http://wiki.python.org/moin/PythonXml
If you are looking for something simpler than XSLT, XMLStarlet is a set of command line tools which may be of interest for you:
http://xmlstar.sourceforge.net/
As any command line tool, this is not especially made for Python, but can be easily integrated into a python script.
Although it’s only useful for writing XML, XMLwitch is freakin’ amazing. For doing non-XML to XML transformations, I highly recommend it!
Hi Dimitre I use lxml to manipulate xml in python.
I have posted some techniques for managing large amounts of xml schemas, namespaces etc.. Automatic XSD validation
One suggestion is to try to use the full xpath when possible.
For example if i had a complex type:
<Person>
<name/>
<age/>
</person>
i could run into the problem if i dont use /Person/name
and later that complex type changes to:
<person>
<name/>
<age/>
<child>
<son>
<name/>
<age/>
</son>
</child>
</person>
The reason being now 'name' exists in multiple places.
Also be aware of schemas that allow for many "persons" in this example. you may need to provide a "key" with your xpath to determine which person you are referencing. you could have 5 or 6 Persons in your xml the xpaths would be identical but the names uniqe, the name would then be your key to use to reference each particular person.
I would also suggest writing your own wrapper functions around lxml that suit your needs. What i did was I created an xmlUtil.py file that contained xml generic functions that i needed. I then created a myXML.py file that had assumptions about my specific xml and behavior. the xmlUtil.py functions only accept the xml content ( this is in case i decide to use something instead of lxml it will be easy to change).
Hope some of this helps. wish i could be more helpful but question is very open ended.
Though it's also 'just' a library - or rather a framework, inxs is an approach to avoid the XSLT hell with rule-based transformations with Python.
精彩评论