How to sub with matched groups and variables in Python

2023-02-17 19:17 问答作者：

new to python. This is probably simple but I haven't found an answer.

rndStr = "20101215"
rndStr2 = "20101216"
str = "Looking at dates between 20110316 and 20110317"
outstr = re.sub("(.+)([0-9]{8})(.+)([0-9]{8})",r'\1'+rndStr+r'\2'+rndStr2,str)

The output I'm looking for is:

Looking at dates between 20101215 and 20101216

But instead I get:

P101215101216

The values of the two rndStr's doesn't really matter. Assume its random or taken from user input (I put static vals here to keep it simple). Thanks f开发者_如何学JAVAor any help.

Your backreferences are ambiguous. Your replacement string becomes

\120101215\220101216

which is two rather large numbers to be backreferencing :)

To solve it, use this syntax:

r'\g<1>'+rndStr+r'\g<2>'+rndStr2

You also have too many sets of parentheses (or "brackets" if you speak British English like me:) - you don't need parentheses around the [0-9]{8} parts which you're not backreferencing:

re.sub("(.+)[0-9]{8}(.+)[0-9]{8}",...

should be sufficient.

(And, as noted elsewhere, don't use str as a variable name. Unless you want to spend ages debugging why str.replace() doesn't work anymore. Not that I ever did that once... noooo. :)

so the whole thing becomes:

import re
rndStr = "20101215"
rndStr2 = "20101216"
s = "Looking at dates between 20110316 and 20110317"
outstr = re.sub("(.+)[0-9]{8}(.+)[0-9]{8}", r'\g<1>'+rndStr+r'\g<2>'+rndStr2, s) 
print outstr

Producing:

Looking at dates between 20101215 and 20101216

Notice if you change the value of rndStr or rndStr2 to text (like 'abc') rather than digits, you get something closer to the expected result?

In your expression to re.sub you have r'\1'+rndStr+... This combines into '\1'+'20101215' which then tries to reference the back reference of \120101215 which is probably not what you intended...

You can use named back references to make the back reference unambiguous:

rep1 = "20101215"
rep2 = "20101216"
st = "Looking at dates between 20110316 and 20110317"

print re.sub(r'(?P<fp>.+)[0-9]{8}(?P<lp>.+)[0-9]{8}',
            r'\g<fp>'+rep1+r'\g<lp>'+rep2,st)

Better still, use an easier to understand syntax and check the return of the attempted match:

m=re.search(r'(?P<fp>.+)[0-9]{8}(?P<lp>.+)[0-9]{8}',st)
if m:
    print m.group('fp')+rep1+m.group('lp')+rep2  #you could use m.group(1) too
else:
    print "no match..."

In either case, your desired string of Looking at dates between 20101215 and 20101216 is produced.

The Python docs on named backreferences:

(?P<name>...)

Similar to regular parentheses, but the substring matched by the group is accessible within the rest of the regular expression via the symbolic group name 'name'. Group names must be valid Python identifiers, and each group name must be defined only once within a regular expression. A symbolic group is also a numbered group, just as if the group were not named. So the group named 'id' in the example below can also be referenced as the numbered group 1.

For example, if the pattern is (?P<id>[a-zA-Z_]\w*), the group can be referenced by its name in arguments to methods of match objects, such as m.group('id') or m.end('id'), and also by name in the regular expression itself (using (?P=id)) and replacement text given to .sub() (using \g<id>).

rndStr = "20101215"
rndStr2 = "20101216"
mys = "Looking at dates between {0} and {1}".format(rndStr, rndStr2)

Please do not use str as a variable name; it overwrites the built-in str type.

rndStr = "20101215"
rndStr2 = "20101216"

print "Looking at dates between %s and %s" %(rndStr,rndStr2)

继续阅读：python regex

How to sub with matched groups and variables in Python

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？