Parsing/Extracting Data from API XML feed with Python and Beautiful Soup
Python/xml newb here playing around with Python and BeautifulSoup trying to learn how to parse XML, specifically messing with the Oodle.com API to list out car classifieds. I've had success with simple XML and BS, but when working with this, I can't seem to get the data I want no matter what I try. I tried reading the Soup documentation for hours and can't figure it out. The XML is structured like:
<?xml version="1.0" encoding="utf-8"?>
<oodle_response stat="ok">
<current>
....
</current>
<listings>
<element>
<id>8453458345</id>
<title>2009 Toyota Avalon XL Sedan 4D</title>
<body>...</body>
<url>...</url>
<images>
<element>...</element>
<element>...</element>
</images>
<attributes>
<features>...</features>
<mileage>32637</mileage>
<price>19999</price>
<trim>XL</trim>
<vin>9234234234234234</vin>
<year>2009</year>
</attributes>
</element>
<element>.. Next car here ..</element>
<element>..Aaaand next one here ..</element>
</listings>
<meta>...</meta>
</oodle_response>
I first make a request with urllib to grab the feed and save to a local file. Then:
xml = open("temp.xml", "r")
from BeautifulSoup import BeautifulStoneSoup
soup = BeautifulStoneSoup(xml)
Then I'm not sure what. I've tried a lot of things but everything seems to throw back way more junk than I want and it makes to difficult to find the issue. I'm trying just get the id, title, mileage, price, year, vin. So how do I g开发者_开发百科et these and expedite the process with a loop? Ideally I wanted a for loop like:
for soup.listings.element in soup.listings:
id = soup.listings.element.id
...
I know that doesn't work obviously but something that would fetch info for the listing, and store it into a list, then move onto the next ad. Appreciate the help guys
You could do something like this:
for element in soup('element'):
id = element.id.text
mileage = element.attributes.mileage.text
price = element.attributes.price.text
year = element.attributes.year.text
vin = element.attributes.vin.text
精彩评论