How to turn python list comprehensions into xml
I need a little help on finding a tutorial or sample on taking a list comprehension and merging that with a data file from csv and turning all that into an xml file. From reading various python books & pdfs like ditp,IYOCGwP, learnpythonthe hardway,, lxml tut, think python and online searches I am most of the way there or so I think. I just need a push on tying everything together. I am basically taking an excel spreadsheet which I am exporting as a csv file. The csv contains rows of records which I need to map into an xml file. I am new to Python and thought I would use my little project to learn the language. The code listed is not pretty but works. I can read in a csv file and dump that into a list. I can combine 3 lists and output the resulting list and I can get my program to spit out a skeleton xml that is almost laid out in the format that I need. I will list my actual output of a small sample and what I am trying to accomplish with the xml below this code. Sorry if this is too lengthy, this is my first post.
import csv, datetime, os
from lxml import etree
from ElementTree_pretty import prettify
f = os.path.getsize("SO.csv")
fh = "SO.csv"
rh = open(fh, "rU")
rows = 0
try:
rlist = csv.reader(rh)
reports = []
for row in rlist:
'''print row.items()'''
rowStripped = [x.strip(' ') for x in row]
reports.append(rowStripped)
rows +=1
except csv.Error, e:
sys.exit('file %s, line %d: %s' % (filename, reader.line_num, e))
finally:
rh.close()
root = etree.Element("co_ehs")
object = etree.SubElement(root, "object")
event = etree.SubElement(object, "event")
facets = etree.SubElement(event, "facets")
categories = etree.SubElement(facets, "categories")
instance = etree.SubElement(categories, "instance")
property = etree.SubElement(instance, "property")
facets = ['header','header','header','header','informational','header','informational']
categories = ['processing','processing','processing','processing','short_title','file_num','short_narrative']
property = ['REPORT ID','NEXT REPORT ID','initial-event-date','number','title','summary-docket-num','description-story']
print('----------Printing Reports from CSV Data----------')
print reports
print('---------END OF CSV DATA-------------')
print
mappings = zip(facets, categories, property)
print('----------Printing Mappings from the zip of facets, categories, property ----------')
print mappings
print('---------END OF List Comprehension-------------')
print
print('----------Printing the xml skeleton that will contain the mappings and the csv data ----------')
print(etree.tostring(root, xml_declaration=True, encoding='UTF-8', pretty_print=True))
print('---------END OF XML Skeleton-------------')
----My OUTPUT---
----------Printing Reports from CSV Data----------
[['1', '12-Dec-04', 'Vehicle Collision', '786689', 'No fault collision due to ice', '-1', '545671'], ['3', '15-Dec-04', 'OJT Injury', '87362', 'Paint fumes combusted causing 2nd degree burns', '4', '588456'], ['4', '17-Dec-04', 'OJT Injury', '87362', 'Paint fumes combusted causing 2nd degree burns', '-1', '58871'], ['1000', '12-Nov-05', 'Back Injury', '9854231', 'Lifting without a support device', '-1', '545671'], ['55555', '12-Jan-06', 'Foot Injury', '7936547', 'Office injury - heavy item dropped on foot', '-1', '545671']]
---------END OF CSV DATA-------------
----------Printing Mappings from the zip of facets, categories, property ----------
[('header', 'processing', 'REPORT ID'), ('header', 'processing', 'NEXT REPORT ID'), ('header', 'processing', 'initial-event-date'), ('header', 'processing', 'number'), ('informational', 'short_title', 'title'), ('header', 'file_num', 'summary-docket-num'), ('informational', 'short_narrative', 'description-story')]
---------END OF List Comprehension-------------
----------Printing the xml skeleton that will contain the mappings and the csv data ----------
<?xml version='1.0' encoding='UTF-8'?>
<co_ehs>
<object>
<event>
<facets>
<categories>
<instance>
<property/>
</instance>
</categories>
</facets>
</event>
</object>
</co_ehs>
---------END OF XML Skeleton-------------
----------CSV DATA------------------
C_ID,NEXT_C_ID,C_DATE,C_NUMBER,C_EVENT,C_DOCKETNUM,C_DESCRIPTION
1,-1,12-Dec-04,545671,Vehicle Collision,786689,"No fault collision due to ice"
3,4,15-Dec-04,588456,OJT Injury,87362,"Paint fumes combusted causing 2nd degree burns"
4,-1,17-Dec-04,58871,OJT Injury,87362,"Paint fumes combusted causing 2nd degree burns"
1000,-1,12-Nov-05,545671,Back Injury,9854231,"Lifting without a support device"
55555,-1,12-Jan-06,545671,Foot Injury,7936547,"Office injury - heavy item dropped on foot"
-----------What I want the xml output to look like----------------------
<?xml version="1.0" encoding="UTF-8"?>
<co_ehs xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="co_ehs.xsd">
<object id="3" object-type="ehs_report">
<event event-tag="0">
<facets name="header">
<categories name="processing">
<instance instance-tag="0">
<property name="REPORT ID" value="1"/>
<property name="NEXT REPORT ID" value="-1"/>
<property name="initial-event-date" value="12-Dec-04"/>
<property name="number" value="545671"/>
</instance>
</categories>
</facets>
<facets name="informational">
<categories name="short_title">
<instance-tag="0">
<property name="title" 开发者_如何学Pythonvalue="Vehicle Collision"/>
</instance>
</categories>
</facets>
<facets name="header">
<categories name="file_num">
<instance-tag="0">
<property name="summary-docket-num" value="786689"/>
</instance>
</categories>
</facets>
<facets name="informational">
<categories name="short_narrative">
<instance-tag="0">
<property name="description-story" value="No fault collision due to ice"/>
</instance>
</categories>
</facets>
</event>
</object>
</co_ehs>
Here is my solution. I use lxml, because it's normally better to generate XML with a framework than with strings or a template file.
The attributes of co_ehs
are missing, but this could easily be fixed with some set()
-calls. I leave it up to you to do this.
BTW: You can accept the best answer by clicking on the check mark on the left side of the answer
import csv, datetime, os
from lxml import etree
def makeFacet(event, newheaders, ev, facetname, catname, count, nhposstart, nhposend):
facets = etree.SubElement(event, "facets", name=facetname)
categories = etree.SubElement(facets, "categories", name=catname)
instance = etree.SubElement(categories, "instance")
instance.set("instance-tag", count)
for i in range(nhposstart, nhposend):
property = etree.SubElement(instance, "property")
property.set("name", newheaders[i])
property.set("value", ev[i].strip())
# read the csv
fh = "SO.csv"
rh = open(fh, "rU")
try:
rlist = list(csv.reader(rh))
except csv.Error as e:
sys.exit("file %s, line %d: %s" % (filename, reader.line_num, e))
finally:
rh.close()
# generate the xml
# newheaders is a mapping of the csv column names, because they don't correspondent w/ the XML
newheaders = ["REPORT_ID","NEXT_REPORT_ID","initial-event-date","number","title","summary-docket-num", "description-story"]
root = etree.Element("co_ehs")
object = etree.SubElement(root, "object")
object.set("id", "3") # Not sure about this one
object.set("object-type", "ehs-report")
for c, ev in enumerate(rlist[1:]):
event = etree.SubElement(object, "event")
event.set("event-tag", "%s"%c)
makeFacet(event, newheaders, ev, "header", "processing", "%s"%c, 0, 4)
makeFacet(event, newheaders, ev, "informational", "short-title", "%s"%c, 4, 5)
makeFacet(event, newheaders, ev, "header", "file_num", "%s"%c, 5, 6)
makeFacet(event, newheaders, ev, "informational", "short_narrative", "%s"%c, 6, 7)
print(etree.tostring(root, xml_declaration=True, encoding="UTF-8", pretty_print=True))
I created a file with name 'pattern.txt' and following content (with this indentation).
Notice the 8 %s
put at strategic places.
<event event-tag="%s">
<facets name="header">
<categories name="processing">
<instance instance-tag="0">
<property name="REPORT ID" value="%s"/>
<property name="NEXT REPORT ID" value="%s"/>
<property name="initial-event-date" value="%s"/>
<property name="number" value="%s"/>
</instance>
</categories>
</facets>
<facets name="informational">
<categories name="short_title">
<instance-tag="0">
<property name="title" value="%s"/>
</instance>
</categories>
</facets>
<facets name="header">
<categories name="file_num">
<instance-tag="0">
<property name="summary-docket-num" value="%s"/>
</instance>
</categories>
</facets>
<facets name="informational">
<categories name="short_narrative">
<instance-tag="0">
<property name="description-story" value="%s"/>
</instance>
</categories>
</facets>
</event>
I created file 'SO.csv' with folowing content:
C_ID,NEXT_C_ID,C_DATE,C_NUMBER,C_EVENT,C_DOCKETNUM,C_DESCRIPTION
1,-1,12-Dec-04,545671,Vehicle Collision,786689,"No fault collision due to ice"
3,4,15-Dec-04,588456,OJT Injury,87362,"Paint fumes combusted causing 2nd degree burns"
4,-1,17-Dec-04,58871,OJT Injury,87362,"Paint fumes combusted causing 2nd degree burns"
1000,-1,12-Nov-05,545671,Back Injury,9854231,"Lifting without a support device"
55555,-1,12-Jan-06,545671,Foot Injury,7936547,"Office injury - heavy item dropped on foot"
And I ran the following code:
import csv
rid = csv.reader(open('SO.csv','rb'))
rid.next()
with open('pattern.txt') as f:
pati = f.read()
xmloutput = [' <?xml version="1.0" encoding="UTF-8"?>',
' <co_ehs xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" '\
'xsi:noNamespaceSchemaLocation="co_ehs.xsd">',
' <object id="3" object-type="ehs_report">']
for i,row in enumerate(rid):
row[0:0] = str(i)
xmloutput.append( pati % tuple(row) )
print '\n'.join(xmloutput)
Does this help you ?
精彩评论