Tips for finding prefixed tags in python lxml?
I am trying to using lxml's ElementTree etree to find a specific tag in my xml document. The tag looks as follows:
<text:ageInformation>
<text:statedAge>12</text:statedAge>
</text:ageInformation>
I was hoping to use etree.find('text:statedAge'), but that method does not like 'text' prefix. It mentions that I should add 'text' to the prefix map, but I am not certain how to do it. Any tips?
Edit: I want to be able to write to the hr4e prefixed tags. Here are the important parts of the document:
<?xml version="1.0" encoding="utf-8"?>
<greenCCD xmlns="AlschulerAssociates::GreenCDA" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:hr4e="hr4e::patientdata" xsi:schemaLocation="AlschulerAssociates::GreenCDA green_ccd.xsd">
<header>
<documentID root="18c41e51-5f4d-4d15-993e-2a932fed720a" />
<title>Health Records for Everyone Continuity of Care Document</title>
<version>
<number>1</number>
</version>
<confidentiality codeSystem="2.16.840.1.113883.5.25" code="N" />
<documentTimestamp value="201105300211+0800" />
<personalInformation>
<patientInformation>
<personID root="2.16.840.1.113883.3.881.PI13023911" />
<personAddress>
<streetAddressLine nullFlavor="NI" />
<city>Santa Cruz</city>
<state nullFlavor="NI" />
<postalCode nullFlavor="NI" />
</personAddress>
<personPhone nullFlavor="NI" />
<personInformation>
<personName>
<given>Benjamin</given>
<family>Keidan</family>
</personName>
<gender codeSystem="2.16.840.1.113883.5.1" code="M" />
<personDateOfBirth value="NI" />
<hr4e:ageInformation>
<hr4e:statedAge>9424</hr4e:statedAge>
<hr4e:estimatedAge>0912</hr4e:estimatedAge>
<hr4e:ye开发者_运维知识库arInSchool>1</hr4e:yearInSchool>
<hr4e:statusInSchool>attending</hr4e:statusInSchool>
</hr4e:ageInformation>
</personInformation>
<hr4e:livingSituation>
<hr4e:homeVillage>Putney</hr4e:homeVillage>
<hr4e:tribe>Oromo</hr4e:tribe>
</hr4e:livingSituation>
</patientInformation>
</personalInformation>
The namespace prefix must be declared (mapped to an URI) in the XML document. Then you can use the {URI}localname
notation to find text:statedAge
and other elements. Something like this:
from lxml import etree
XML = """
<root xmlns:text="http://example.com">
<text:ageInformation>
<text:statedAge>12</text:statedAge>
</text:ageInformation>
</root>"""
root = etree.fromstring(XML)
ageinfo = root.find("{http://example.com}ageInformation")
age = ageinfo.find("{http://example.com}statedAge")
print age.text
This will print "12".
Another way of doing it:
ageinfo = root.find("text:ageInformation",
namespaces={"text": "http://example.com"})
age = ageinfo.find("text:statedAge",
namespaces={"text": "http://example.com"})
print age.text
You can also use XPath:
age = root.xpath("//text:statedAge",
namespaces={"text": "http://example.com"})[0]
print age.text
I ended up having to use nested prefixes:
from lxml import etree
XML = """
<greenCCD xmlns="AlschulerAssociates::GreenCDA" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:hr4e="hr4e::patientdata" xsi:schemaLocation="AlschulerAssociates::GreenCDA green_ccd.xsd">
<personInformation>
<hr4e:ageInformation>
<hr4e:statedAge>12</hr4e:statedAge>
</hr4e:ageInformation>
</personInformation>
</greenCCD>"""
root = etree.fromstring(XML)
#root = etree.parse("hr4e_patient.xml")
ageinfo = root.find("{AlschulerAssociates::GreenCDA}personInformation/{hr4e::patientdata}ageInformation")
age = ageinfo.find("{hr4e::patientdata}statedAge")
print age.text
精彩评论