开发者

Python xml etree DTD from a StringIO source?

I'm adapting the following code (created via advice in this question), that took an XML file and it's DTD and converted them to a different format. For this problem only the loading section is important:

xmldoc = open(filename)

parser = etree.XMLParser(dtd_validation=True, load_dtd=True)    
tree = etree.parse(xmldoc, parser)

This worked fine, whilst using the file system, but I'm converting it to run via a web framework, where the two files are loaded via a form.

Loading the xml file works fine:

tree = etree.parse(StringIO(data['xml_file']) 

But as the DTD is linked to in the top of the xml file, the following statement fails:

parser = etree.XMLParser(dtd_validation=True, load_dtd=True)
tree = etree.parse(StringIO(data['xml_file'], parser)

Via this question, I tried:

etree.DTD(StringIO(data['dtd_file'])
tree = etree.parse(StringIO(data['xml_file'])

Whilst the first line doesn't cause an error, the second falls over on unicode entities the DTD is meant to pick up (and does so in the file system version):

XMLSyntaxError: En开发者_运维问答tity 'eacute' not defined, line 4495, column 46

How do I go about correctly loading this DTD?


Here's a short but complete example, using the custom resolver technique @Steven mentioned.

from StringIO import StringIO
from lxml import etree

data = dict(
    xml_file = '''<?xml version="1.0"?>
<!DOCTYPE x SYSTEM "a.dtd">
<x><y>&eacute;zz</y></x>
''',
    dtd_file = '''<!ENTITY eacute "&#233;">
<!ELEMENT x (y)>
<!ELEMENT y (#PCDATA)>
''')

class DTDResolver(etree.Resolver):
     def resolve(self, url, id, context):
         return self.resolve_string(data['dtd_file'], context)

xmldoc = StringIO(data['xml_file'])
parser = etree.XMLParser(dtd_validation=True, load_dtd=True)
parser.resolvers.add(DTDResolver())
try:
    tree = etree.parse(xmldoc, parser)
except etree.XMLSyntaxError as e:
    # handle xml and validation errors


You could probably use a custom resolver. The docs actually give an example of doing this to provide a dtd.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜