开发者

using stored variables as regex patterns

is there a way for python to use values stored in variables as patterns in regex?

supposing i have two variables:

begin_tag = '<%marker>'
end_tag = '<%marker/>'

doc = '<html> something here <%marker> and here and here <%marker/> and more here <html>'

how do you extract the text between begin_开发者_如何学Ctag and end_tag?

the tags are determined after parsing another file, so they're not fixed.


Don't use a regex at all. parse html inteligently!

from BeautifulSoup import BeautifulSoup
marker = 'mytag'
doc = '<html>some stuff <mytag> different stuff </mytag> other things </html>'
soup = BeautifulSoup(doc)
print soup.find(marker).renderContents()


Regular expressions are strings. So you can do anything you want to build them: concatenate them (using + operator), interpolation (using % operator), etc. Just concatenate the variables you want to match with the regex you want to use:

begin_tag + ".*?" + end_tag

The only pitfall is when your variables contain characters that might be taken by the regular expression engine to have special meaning. You need to make sure they are escaped properly in that case. You can do this with the re.escape() function.

The usual caveat ("don't parse HTML with regular expressions") applies.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜