Find and Replace tags in XML using Python
I have proposed a similar question before, but this one is slightly different. I want to find and replace XML tags using python. I am using the XML's to upload as metadata for some GIS shapefiles. In the metadata editor, I have options to choose dates for when certain data is collected. The options are 'single date', 'multiple dates' and 'range of dates'. In the first XML, which contains tags for a range of dates, you will see tags "rngdates" with some subelements 'begdate', 'begtime', 'enddate' and . I want to edit these tags out so that it looks like the second XML which contains multiple single dates. The new tags are 'mdattim', 'sngdate' and 'caldate'. I hope this is clear enough, but please ask for more info if needed. XML is a weird beast, and I'm still not fully understanding it.
Thanks, Mike
First XML:
<idinfo>
<citation>
<citeinfo>
<origin>My Company Name</origin>
<pubdate>05/04/2009</pubdate>
<title>Feature Class Name</title>
<edition>0</edition>
<geoform>vector digital data</geoform>
<onlink>.</onlink>
</citeinfo>
</citation>
<descript>
<abstract>This dataset represents the GPS location of inspection points collected in the field for the Site Name</abstract>
<purpose>This dataset was created to accompany the clients Assessment Plan. This point feature class represents the location within the area that the field crews collected related data.</purpose>
</descript>
<timeperd>
<timeinfo>
<rngdates>
<begdate>7/13/2010</begdate>
<begtime>unknown</begtime>
<enddate>7/15/2010</enddate>
<endtime>unknown</endtime>
</rngdates>
</timeinfo>
<current>ground condition</current>
</time开发者_运维知识库perd>
Second XML:
<idinfo>
<citation>
<citeinfo>
<origin>My Company Name</origin>
<pubdate>03/07/2011</pubdate>
<title>Feature Class Name</title>
<edition>0</edition>
<geoform>vector digital data</geoform>
<onlink>.</onlink>
</citeinfo>
</citation>
<descript>
<abstract>This dataset represents the GPS location of inspection points collected in the field for the Site Name</abstract>
<purpose>This dataset was created to accompany the clients Assessment Plan. This point feature class represents the location within the area that the field crews collected related data.</purpose>
</descript>
<timeperd>
<timeinfo>
<mdattim>
<sngdate>
<caldate>08-24-2009</caldate>
<time>unknown</time>
</sngdate>
<sngdate>
<caldate>08-26-2009</caldate>
</sngdate>
<sngdate>
<caldate>08-26-2009</caldate>
</sngdate>
<sngdate>
<caldate>07-07-2010</caldate>
</sngdate>
</mdattim>
</timeinfo>
This is my Python code so far:
folderPath = "Z:\ESRI\Figure_Sourcing\Figures\Metadata\IOR_Run_Metadata_2009"
for filename in glob.glob(os.path.join(folderPath, "*.xml")):
fullpath = os.path.join(folderPath, filename)
if os.path.isfile(fullpath):
basename, filename2 = os.path.split(fullpath)
root = ElementTree(file=r"Z:\ESRI\Figure_Sourcing\Figures\Metadata\Run_Metadata_2009\\" + filename2)
iter = root.getiterator()
#Iterate
for element in iter:
print element.tag
if element.tag == "begdate":
element.tag.replace("begdate", "sngdate")
I believe I succeeded in making the code work. This will allow you to edit certain tags if you need to change them from an existing XML file. I needed to do this to create metadata for some GIS shapefiles in a batch processing script to change certain date values depending on if they were single dates, multiple dates or a range of dates.
This webpage helped a lot: http://lxml.de/tutorial.html
I have some more work to do, but this was the answer I was looking for from my original question :) I'm sure this can be used in many other applications.
# Set workspace location for XML files
folderPath = "Z:\ESRI\Figure_Sourcing\Figures\Metadata\IOR_Run_Metadata_2009"
# Loop through each file and search for files with .xml extension
for filename in glob.glob(os.path.join(folderPath, "*.xml")):
fullpath = os.path.join(folderPath, filename)
# Split file name from the directory path
if os.path.isfile(fullpath):
basename, filename2 = os.path.split(fullpath)
# Set variable to XML files
root = ElementTree(file=r"Z:\ESRI\Figure_Sourcing\Figures\Metadata\IOR_Run_Metadata_2009\\" + filename2)
# Set variable for iterator
iter = root.getiterator()
#Iterate through the tags in each XML file
for element in iter:
if element.tag == "timeinfo":
tree = root.find(".//timeinfo")
# Clear all tags below the "timeinfo" tag
tree.clear()
# Append new Element
element.append(ET.Element("mdattim"))
# Create SubElements to the parent tag
child1 = ET.SubElement(tree, "sngdate")
child2 = ET.SubElement(child1, "caldate")
child3 = ET.SubElement(child1, "time")
# Set text values for tags
child2.text = "08-24-2009"
child3.text = "unknown
精彩评论