开发者

BeautifulSoup question

<parent1>
    <span>Text1</span>
</parnet1>
<parent2>
    <span>Text2</span>
</parnet2>
<parent3>
    <span>Text3</span>
</parnet3>

I'm parsing this with Python & BeautifulSoup. I have a variable soupData which stores pointer for need object. How can I get pointer for the parent2, for example, if I have the text 开发者_如何学JAVAText2. So the problem is to filter span-tags by content. How can I do this?


After correcting the spelling on the end-tags:

[e for e in soup(recursive=False, text=False) if e.span.string == 'Text2']


I don't think there's a way to do it in a single step. So:

for parenttag in soupData:
    if parenttag.span.string == "Text2":
        do_stuff(parenttag)
        break

It's possible to use a generator expression, but not much shorter.


Using python 2.7.6 and BeautifulSoup 4.3.2 I found Marcelo's answer to give an empty list. This worked for me, however:

[x.parent for x in bSoup.findAll('span') if x.text == 'Text2'][0]

Alternatively, for a ridiculously overengineered solution (to this particular problem at least, but maybe it would be useful if you'll be doing filtering on criteria too long to put in a reasonably easily understandable list expression) you could do:

def hasText(text):
    def hasTextFunc(x):
        return x.text == text
    return hasTextFunc

to create a function factory, then

hasTextText2 = hasText('Text2')

filter(hasTextText2,bSoup.findAll('span'))[0].parent

to get the reference to the parent tag that you were looking for

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜