Repeating regex groups

2022-12-15 10:26 问答作者：

I'm trying to get some information from a web site. The information I want is in a table so I made a regex but I don't know the right way to simplify it.

The following are two parts of my regex that I would like to simplify:

<br>(.*)<br>(.*)<br>(.*)

<tr><td>(.+)r>(.+)r>(.+)r>(.+).+</td></tr> # This part should be repeated n times(n = 1 to 10)

I looked through the python d开发者_StackOverflow中文版ocumentation and I can't realize how to do it. Perhaps you can give me a hint.

Thank you, mF.

This is the wrong way to go unless you're trying to scrape some data out of a tiny fragment.

It would be much better if you used a tolerant HTML. BeautifulSoup mentioned earlier is a good one but it's stagnating and I don't believe it's being maintained actively anymore.

A highly recommended parser for Python is lxml.

There was a long thread discussing parsing XHTML on one of our local mailing lists here which you might find useful too.

RegEx match open tags except XHTML self-contained tags

"Have you tried using an XML parser instead?"

EDIT: This is the way to go: Beautiful Soup

You just need to put the block in parens and then use the {...} operators, e.g.:

(foo...){1,10}

Matches 1 to 10 instances of the thing inside of there. Given your example above, you can nest those:

((f..)(b..)){1,10}

继续阅读：python regex

Repeating regex groups

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？