开发者

how to parse table inside table using beautiful soup?

I tried this: s = soup.findAll("table", {"class": "view"}) But it is giving me the table. But I need the table inside table.

<table class="view" >
    <tr>
        <t开发者_Python百科d width="46%" valign="top">
        <table>
    <tr>
        <td>
            <div style="adasdasd">
                <div class="abc">dasdsadasdasdas</div>
            </div>
            <div>
                <span><span class="aaaaaaa " title="aaaaaaaaaaa"><span>aaaaaaaaaaaaa</span></span> </span>
                <b>My Face</b><br />
                    Hello This is me,
                </div>
            <div class="abc"">
                    Dec 6, 2010 by Alis
                </div>
        </td>
    </tr>
        </table>
    </tr>
    </table>

The things I want to scrap is:

    Hello This is me,

    My Face

    Dec 6, 2010 by Alis


s = soup.findAll("table", {"class": "view"})[0].find("table")

If there's just the one table, you could use .find for the first one too, and drop the [0].


Heres some better formatted html:

<table class="view" >
    <tr>
        <td width="46%" valign="top">
            <table>
                <tr>
                    <td>
                        <div style="adasdasd">
                            <div class="abc">dasdsadasdasdas</div>
                        </div>
                        <div>
                            <span>
                                <span class="aaaaaaa " title="aaaaaaaaaaa">
                                    <span>aaaaaaaaaaaaa</span>
                                </span>
                            </span>
                            <b>My Face</b>
                            <br />
                            Hello This is me,
                        </div>
                        <div class="abc">
                            Dec 6, 2010 by Alis
                        </div>
                    </td>
                </tr>
            </table>
        </td>
    </tr>
</table>

Note: I actually added a tag because it was missing one.

innerTable = soup.find("table", {"class": "view"}).tr.td.table ##Gets the table in the first cell of the first row

innerDiv = innerTable.find("div", {"style": "adasdasd"}).nextSibling #this gets the div in which all of you content resides

So that will get you to that that holds all of your content. From there it's just a little bit of parsing to get the content you actually need.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜