Python lxml: Strange issue with reading repeated elements and storing in a list
I'm running into a bizarre issue. I've had two very different versions of code to solve the same issue, and have run into the same problem.
I have simplified the problem down to this:
Here is the xml file:
<Test>
<Object name="Ob1">
<List/>
</Object>
<Object name="Ob2">
<List>
<item>One</item>
<item>Two</item>
</List>
</Object>
<Object name="Ob3">
<List>
<item>Three</item开发者_运维知识库>
<item>Four</item>
<item>Five</item>
</List>
</Object>
</Test>
Here is the python code:
from lxml import etree
#Load XML
fileobject = open("list_test.xml", "r") #read-only
parser = etree.XMLParser(remove_blank_text=True)
tree = etree.parse(fileobject, parser)
root = tree.getroot()
object_list = []
class TestClass():
name = None
list = []
for OB in root:
temp_ob = TestClass()
temp_ob.name = OB.get("name")
for SubElem in OB:
if SubElem.tag == "List":
for item in SubElem:
if item.tag == "item":
temp_ob.list.append(item.text)
object_list.append(temp_ob)
del temp_ob
for ob in object_list:
print ob.name
print ob.list
The code is supposed to store all the <item>
elements in a list in an object, which is itself stored in a list.
However, here is the output I am getting:
Ob1
['One', 'Two', 'Three', 'Four', 'Five']
Ob2
['One', 'Two', 'Three', 'Four', 'Five']
Ob3
['One', 'Two', 'Three', 'Four', 'Five']
Why is it getting all the <item>
elements in the WHOLE document?
The TestClass.list
is a class-level attribute, so each object_list.append()
is happening on the same list.
For example:
class Foo(object):
lst = []
f1 = Foo()
f1.lst.append(1)
f2 = Foo()
f2.lst.append(2)
print f1.lst
print f2.lst
[1, 2]
[1, 2]
You should make it an instance-level attribute:
class Bar(object):
def __init__(self):
self.lst = []
b1 = Bar()
b1.lst.append(1)
b2 = Bar()
b2.lst.append(2)
print b1.lst
print b2.lst
[1]
[2]
精彩评论