Organizing XML data into dictionaries

2023-03-22 07:52 问答作者：

I'm trying to organize my data into a dictionary format from XML data. This will be used to run Monte Carlo simulations.

Here is an example of what a couple of entries in the XML look like:

<retirement>
    <item>
        <low>-0.34</low>
        <high>-0.32</high>
        <freq>0.0294117647058824</freq>
        <variable>stock</variable>
        <type>historic</type>
    </item>
    <item>
        <low>-0.32</low>
        <high>-0.29</high>
        <freq>0</freq>
        <variable>stock</variable>
        <type>historic</type>
    </item>
</retirement>

My current data sets only have two variables and the type can be 1 of 3 or possible 4 discrete types. Hard coding two variables isn't a problem, but I would like to start working with data that has many more variables and automate this process. My goal is to automatically import this XML data into a dictionary to be able to further manipulate it later without having to hard code in the array titles and the variables.

Here is what I have:

# Import XML Parser
import xml.etree.ElementTree as ET

# Parse XML directly from the file path
tree = ET.parse('xmlfile')

# Create iterable item list
Items = tree.findall('item')

# Create Master Dictionary
masterDictionary = {}

# Assign variables to dictionary
for Item in Items:
    thisKey = Item.find('variable').text
    if thisKey in masterDictionary == False:
        masterDictionary[thisKey] = []
    else:
        pass

thisList = masterDictionar开发者_如何转开发y[thisKey]
newDataPoint = DataPoint(float(Item.find('low').text), float(Item.find('high').text), float(Item.find('freq').text))
thisSublist.append(newDataPoint)

I'm getting a KeyError @ thisList = masterDictionary[thisKey]

I am also trying to create a class to deal with some of the other elements of the xml:

# Define a class for each data point that contains low, hi and freq attributes
class DataPoint:
 def __init__(self, low, high, freq):
  self.low = low
  self.high = high
  self.freq = freq

Would I then be able to check a value with something like:

masterDictionary['stock'] [0].freq

Any and all help is appreciated

UPDATE

Thanks for the help John. The indentation issues are sloppiness on my part. It's my first time posting on Stack and I just didn't get the copy/paste right. The part after the else: is in fact indented to be a part of the for loop and the class is indented with four spaces in my code--just a bad posting here. I'll keep the capitalization convention in mind. Your suggestion indeed worked and now with the commands:

print masterDictionary.keys()
print masterDictionary['stock'][0].low

yields:

['inflation', 'stock']
-0.34

those are indeed my two variables and the value syncs with the xml listed at the top.

UPDATE 2

Well, I thought I had figured this one out, but I was careless again and it turns out that I hadn't quite fixed the issue. The previous solution ended up writing all of the data to my two dictionary keys so that I have two equal lists of all the data assigned to two different dictionary keys. The idea is to have distinct sets of data assigned from the XML to the matching dictionary key. Here is the current code:

# Import XML Parser
import xml.etree.ElementTree as ET

# Parse XML directly from the file path
tree = ET.parse(xml file)

# Create iterable item list
items = tree.findall('item')

# Create class for historic variables
class DataPoint:
    def __init__(self, low, high, freq):
        self.low = low
        self.high = high
        self.freq = freq

# Create Master Dictionary and variable list for historic variables
masterDictionary = {}
thisList = []

# Loop to assign variables as dictionary keys and associate their values with them
for item in items:
    thisKey = item.find('variable').text 
    masterDictionary[thisKey] = thisList
    if thisKey not in masterDictionary:
        masterDictionary[thisKey] = []
    newDataPoint = DataPoint(float(item.find('low').text), float(item.find('high').text), float(item.find('freq').text))
    thisList.append(newDataPoint)

When I input:

print masterDictionary['stock'][5].low
print masterDictionary['inflation'][5].low
print len(masterDictionary['stock'])
print len(masterDictionary['inflation'])

the results are identical for both keys ('stock' and 'inflation'):

-.22
-.22
56
56

There are 27 items with the stock tag in the XML file and 29 tagged with inflation. How can I make each list assigned to a dictionary key only pull the particular data in the loop?

UPDATE 3

It seems to work with 2 loops, but I have no idea how and why it won't work in 1 single loop. I managed this accidentally:

# Import XML Parser
import xml.etree.ElementTree as ET

# Parse XML directly from the file path
tree = ET.parse(xml file)

# Create iterable item list
items = tree.findall('item')

# Create class for historic variables
class DataPoint:
    def __init__(self, low, high, freq):
        self.low = low
        self.high = high
        self.freq = freq

# Create Master Dictionary and variable list for historic variables
masterDictionary = {}

# Loop to assign variables as dictionary keys and associate their values with them
for item in items:
    thisKey = item.find('variable').text
    thisList = []
    masterDictionary[thisKey] = thisList

for item in items:
    thisKey = item.find('variable').text
    newDataPoint = DataPoint(float(item.find('low').text), float(item.find('high').text), float(item.find('freq').text))
    masterDictionary[thisKey].append(newDataPoint)

I have tried a large number of permutations to make it happen in one single loop but no luck. I can get all of the data listed into both keys--identical arrays of all the data (not very helpful), or the data sorted correctly into 2 distinct arrays for both keys, but only the last single data entry (the loop overwrites itself each time leaving you with only one entry in the array).

You have a serious indentation problem after the (unnecessary) else: pass. Fix that and try again. Does the problem occur with your sample input data? other data? First time around the loop? What is the value of thisKey that is causing the problem [hint: it's reported in the KeyError error message]? What are the contents of masterDictionary just before the error happens [hint: sprinkle a few print statements around your code]?

Other remarks not relevant to your problem:

Instead of if thisKey in masterDictionary == False: consider using if thisKey not in masterDictionary: ... comparisons against True or False are almost always redundant and/or a bit of a "code smell".

Python convention is to reserve names with an initial capital letter (like Item) for classes.

Using only one space per indentation level makes code almost illegible and is severely deprecated. Use 4 always (unless you have a good reason -- but I've never heard of one).

Update I was wrong: thisKey in masterDictionary == False is worse than I thought; because in is a relational operator, chained evaluation is used (like a <= b < c) so you have (thisKey in masterDictionary) and (masterDictionary == False) which will always evaluate to False, and thus the dictionary is never updated. The fix is as I suggested: use if thisKey not in masterDictionary:

Also it looks like thisList (initialised but not used) should be thisSublist (used but not initialised).

Change:

if thisKey in masterDictionary == False:

if thisKey not in masterDictionary:

That seems to be why you were getting that error. Also, you need to assign something to 'thisSublist' before you try and append to it. Try:

thisSublist = []
thisSublist.append(newDataPoint)

You have an error in your if-statement inside the for-loop. Instead of

if thisKey in masterDictionary == False:

write

if (thisKey in masterDictionary) == False:

Given the rest of your original code, you will be able to access data like so:

masterDictionary['stock'][0].freq

John Machin makes some valid points regarding style and smell, (and you should think about his suggested changes), but those things will come with time and experience.

继续阅读：montecarlo python xml

Organizing XML data into dictionaries

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？