How to validate XML with multiple namespaces in Python?

2023-02-17 12:16 问答作者：

I'm trying to write some unit tests in Python 2.7 to validate against some extensions I've made to the OAI-PMH schema: http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd

The problem that I'm running into is business with multiple nested namespaces is caused by this specification in the above mentioned XSD:

<complexType name="metadataType">
    <annotation>
        <documentation>Metadata must be expressed in XML that complies
        with another XML Schema (namespace=#other). Metadata must be 
        explicitly qualified in the response.</documentation>
    </annotation>
    <sequence>
        <any namespace="##other" processContents="strict"/>
    </sequence>
</complexType>

Here's a snippet of the code I'm using:

import lxml.etree, urllib2

query = "http://localhost:8080/OAI-PMH?verb=GetRecord&by_doc_ID=false&metadataPrefix=nsdl_dc&identifier=http://www.purplemath.com/modules/ratio.htm"
schema_file = file("../schemas/OAI/2.0/OAI-PMH.xsd", "r")
schema_doc = etree.parse(schema_file)
oaischema = etree.XMLSchema(schema_doc)

request = urllib2.Request(query, headers=xml_headers)
response = urllib2.urlopen(request)
body = response.read()
response_doc = etree.fromstring(body开发者_如何转开发)

try:
    oaischema.assertValid(response_doc)
except etree.DocumentInvalid as e:
     line = 1;
     for i in body.split("\n"):
        print "{0}\t{1}".format(line, i)
        line += 1
     print(e.message)

I end up with the following error:

AssertionError: http://localhost:8080/OAI-PMH?verb=GetRecord&by_doc_ID=false&metadataPrefix=nsdl_dc&identifier=http://www.purplemath.com/modules/ratio.htm
Element '{http://www.openarchives.org/OAI/2.0/oai_dc/}oai_dc': No matching global element declaration available, but demanded by the strict wildcard., line 22

I understand the error, in that the schema is requiring that the child element of the metadata element be strictly validated, which the sample xml does.

Now I've written a validator in Java that works - however it would be helpful for this to be in Python, since the rest of the solution I'm building is Python based. To make my Java variant work, I had to make my DocumentFactory namespace aware, otherwise I got the same error. I've not found any working example in python that performs this validation correctly.

Does anyone have an idea how I can get an XML document with multiple nested namespaces as my sample doc validate with Python?

Here is the sample XML document that i'm trying to validate:

<?xml version="1.0" encoding="UTF-8"?> 
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" 
     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
     xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/
     http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">
  <responseDate>2002-02-08T08:55:46Z</responseDate>
  <request verb="GetRecord" identifier="oai:arXiv.org:cs/0112017"
       metadataPrefix="oai_dc">http://arXiv.org/oai2</request>
  <GetRecord>
   <record> 
    <header>
      <identifier>oai:arXiv.org:cs/0112017</identifier> 
      <datestamp>2001-12-14</datestamp>
      <setSpec>cs</setSpec> 
      <setSpec>math</setSpec>
    </header>
    <metadata>
      <oai_dc:dc 
     xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" 
     xmlns:dc="http://purl.org/dc/elements/1.1/" 
     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
     xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ 
     http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
    <dc:title>Using Structural Metadata to Localize Experience of 
          Digital Content</dc:title> 
    <dc:creator>Dushay, Naomi</dc:creator>
    <dc:subject>Digital Libraries</dc:subject> 
    <dc:description>With the increasing technical sophistication of 
        both information consumers and providers, there is 
        increasing demand for more meaningful experiences of digital 
        information. We present a framework that separates digital 
        object experience, or rendering, from digital object storage 
        and manipulation, so the rendering can be tailored to 
        particular communities of users.
    </dc:description> 
    <dc:description>Comment: 23 pages including 2 appendices, 
        8 figures</dc:description> 
    <dc:date>2001-12-14</dc:date>
      </oai_dc:dc>
    </metadata>
  </record>
 </GetRecord>
</OAI-PMH>

Found this in lxml's doc on validation:

>>> schema_root = etree.XML('''\
...   <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
...     <xsd:element name="a" type="xsd:integer"/>
...   </xsd:schema>
... ''')
>>> schema = etree.XMLSchema(schema_root)

>>> parser = etree.XMLParser(schema = schema)
>>> root = etree.fromstring("<a>5</a>", parser)

So, perhaps, what you need is this? (See last two lines.):

schema_doc = etree.parse(schema_file)
oaischema = etree.XMLSchema(schema_doc)

request = urllib2.Request(query, headers=xml_headers)
response = urllib2.urlopen(request)
body = response.read()
parser = etree.XMLParser(schema = oaischema)
response_doc = etree.fromstring(body, parser)

继续阅读：python validation xml xsd

How to validate XML with multiple namespaces in Python?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？