开发者

Find elements based on xsd type with lxml

I am trying to get a list of elements wit开发者_高级运维h a specific xsd type with lxml 2.x and I can't figure out how to traverse the xsd for specific types.

Example of schema:

<xsd:element name="ServerOwner" type="srvrs:string90" minOccurs="0">
<xsd:element name="HostName" type="srvrs:string35" minOccurs="0">

Example xml data:

<srvrs:ServerOwner>John Doe</srvrs:ServerOwner>
<srvrs:HostName>box01.example.com</srvrs:HostName>

The ideal function would look like:

    elements = getElems(xml_doc, 'string90')

    def getElems(xml_doc, xsd_type):
      ** xpath or something to find the elements and build a dict
      return elements


Really the only special support lxml has for XML Schema, as seen here, is to tell you if some document is valid according to some schema or not. Anything more sophisticated you'll have to do yourself.

This should be a relatively simple two-phase process, I'd think -- get all the xsd:element elements in the schema that match the type you care about, and look at their names:

def getElems(schemaDoc, xmlDoc, typeName):
    names = schemaDoc.xpath("//xsd:element[@type = $n]/@name",
                            namespaces={"xsd": 
                                        "http://www.w3.org/2001/XMLSchema"},
                            n=typeName)

Then, fetch all the elements with each name from the document.

    elements = []
    for name in names: 
        namedElements = xmlDoc.xpath("//*[local-name() = $name]", name=name)
        elements.extend(namedElements)

Now you have a list of elements with the names that matched the type in the schema.

    return elements

Note that the xpath expression for searching the document has to look at every element, so if you can tighten that up to only look in the subsection of the document you care about it'll go faster.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜