Beautifulsoup Parsing- detail info

2023-03-31 19:23 问答作者：

I already asked a question, but it seems my explnation was not clear.. So, I am asking again with more detail info.

<h2 class="sectionTitle">
CORPORATE HEADQUARTERS  </h2>
277 Park Avenue<br />
New York, New York 10172
<br /><br />United States<br /><br />

I would like to extract only New York, New York without pos开发者_如何学编程tal code 10172

And this is another question..

<h2 class="sectionTitle">
BACKGROUND</h2>
He graduated Blabala 
</span>

I would like to extract only He graduated Blabla

I have been spending few days, so I feel I could become crazy.. Please help me.. thank you for your kind help in advance.

You still need more detail to write a good regex.

For example, if you want to extract the second line of "CORPORATE HEADQUARTERS" without a postal code that always exists, it can be written like this:

>>> import re
>>> html = '''
... <h2 class="sectionTitle">
... CORPORATE HEADQUARTERS  </h2>
... 277 Park Avenue<br />
... New York, New York 10172
... <br /><br />United States<br /><br />
... 
... <h2 class="sectionTitle">
... BACKGROUND</h2>
... He graduated Blabala
... </span>
... '''
>>> re.search('(?s)<h2 class="sectionTitle">\s*CORPORATE HEADQUARTERS\s*</h2>.*?<br />([^<>]+) \d+', html).group(1).strip()
'New York, New York'
>>> re.search('(?s)<h2 class="sectionTitle">\s*BACKGROUND\s*</h2>([^<>]+)', html).group(1).strip()
'He graduated Blabala'

You should use a combination of tag.contents with .split('\n') to split on lines and.rsplit(' ', 1)` to split only the right most space-separated string.

继续阅读：python

Beautifulsoup Parsing- detail info

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？