开发者

Parsing/Extracting Data from API XML feed with Python and Beautiful Soup

Python/xml newb here playing around with Python and BeautifulSoup trying to learn how to parse XML, specifically messing with the Oodle.com API to list out car classifieds. I've had success with simple XML and BS, but when working with this, I can't seem to get the data I want no matter what I try. I tried reading the Soup documentation for hours and can't figure it out. The XML is structured like:

<?xml version="1.0" encoding="utf-8"?>
<oodle_response stat="ok">
    <current>
        ....
    </current>
    <listings>
        <element>
            <id>8453458345</id>
            <title>2009 Toyota Avalon XL Sedan 4D</title>
            <body>...</body>
            <url>...</url>
            <images>
                <element>...</element>
                <element>...</element>
            </images>
            <attributes>
                <features>...</features>
                <mileage>32637</mileage>
                <price>19999</price>
                <trim>XL</trim>
                <vin>9234234234234234</vin>
                <year>2009</year>
            </attributes>
        </element>      
        <element>.. Next car here ..</element>
        <element>..Aaaand next one here ..</element>    
    </listings>
    <meta>...</meta>
</oodle_response>

I first make a request with urllib to grab the feed and save to a local file. Then:

xml = open("temp.xml", "r")
from BeautifulSoup import BeautifulStoneSoup
soup = BeautifulStoneSoup(xml)

Then I'm not sure what. I've tried a lot of things but everything seems to throw back way more junk than I want and it makes to difficult to find the issue. I'm trying just get the id, title, mileage, price, year, vin. So how do I g开发者_开发百科et these and expedite the process with a loop? Ideally I wanted a for loop like:

for soup.listings.element in soup.listings:
    id = soup.listings.element.id
    ...

I know that doesn't work obviously but something that would fetch info for the listing, and store it into a list, then move onto the next ad. Appreciate the help guys


You could do something like this:

for element in soup('element'):
    id = element.id.text
    mileage = element.attributes.mileage.text
    price = element.attributes.price.text
    year = element.attributes.year.text
    vin = element.attributes.vin.text
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜