In-document schema declarations and lxml
As per the official documentation of lxml, if one wants to validate a xml document against a xml schema document, one has to
- construct the XMLSchema object (basically, parse the schema document)
- construct the XMLParser, passing the XMLSchema object as its
schema
argument - parse the actual xml document (instance document) using the constructed parser
There can be variations, but the essense is pretty much the same no matter how you do it, - the schema i开发者_如何学JAVAs specified 'externally' (as opposed to specifying it inside the actual xml document).
If you follow this procedure, the validation occurs, sure enough, but if I understand it correctly, that completely ignores the whole idea of the schemaLocation and noNamespaceSchemaLocation attributes from xsi
This introduces a whole bunch of limitations, starting with the fact, that you have to deal with instance<->schema relation all by yourself (either store it externally or write some hack to retrieve the schema location from the root element of the instance document), you can not validate the document using multiple schemata (say, when each schema governs its own namespace) and so on.
So the question is: maybe I am missing something completely trivial or doing it wrong? Or are my statements about lxml's limitations regarding schema validation true?
To recap, I'd like to be able to:
- have the parser use the schema location declarations in the instance document at parse/validation time
- use multiple schemata to validate a xml document
- declare schema locations on non-root elements (not of extreme importance)
Maybe I should look for a different library? Although, that'd be a real shame, - lxml is a de-facto xml processing library for python and is regarded by everyone as the best one in terms of performace/features/convenience (and rightfully so, to a certain extent)
Caution: this is not the full answer to this, because I don't know all that much about lxml in particular.
In can just tell you that:
- Ignoring schemalocations in documents and instead managing a namespace -> schema file mapping in an application is almost always better, unless you can guarantee that the schema will be in a very specific location compared to the file. If you want to move it out of code, use a catalogue or come up with a configuration file.
- If you do want to use schemaLocation, and want to validate multiple schemas, just include them all in one schemaLocation attribute, separated by spaces, in namespace URI/location pairs:
xsi:schemaLocation="urn:schema1 schema1.xsd urn:schema2 schema2.xsd
. - Finally, I don't think any processor will find schemaLocation attributes declared on non-root elements. Not that it matters: just put them all on the root.
精彩评论