开发者

How to sub with matched groups and variables in Python

new to python. This is probably simple but I haven't found an answer.

rndStr = "20101215"
rndStr2 = "20101216"
str = "Looking at dates between 20110316 and 20110317"
outstr = re.sub("(.+)([0-9]{8})(.+)([0-9]{8})",r'\1'+rndStr+r'\2'+rndStr2,str)

The output I'm looking for is:

Looking at dates between 20101215 and 20101216

But instead I get:

P101215101216

The values of the two rndStr's doesn't really matter. Assume its random or taken from user input (I put static vals here to keep it simple). Thanks f开发者_如何学JAVAor any help.


Your backreferences are ambiguous. Your replacement string becomes

\120101215\220101216

which is two rather large numbers to be backreferencing :)

To solve it, use this syntax:

r'\g<1>'+rndStr+r'\g<2>'+rndStr2 

You also have too many sets of parentheses (or "brackets" if you speak British English like me:) - you don't need parentheses around the [0-9]{8} parts which you're not backreferencing:

re.sub("(.+)[0-9]{8}(.+)[0-9]{8}",...

should be sufficient.

(And, as noted elsewhere, don't use str as a variable name. Unless you want to spend ages debugging why str.replace() doesn't work anymore. Not that I ever did that once... noooo. :)

so the whole thing becomes:

import re
rndStr = "20101215"
rndStr2 = "20101216"
s = "Looking at dates between 20110316 and 20110317"
outstr = re.sub("(.+)[0-9]{8}(.+)[0-9]{8}", r'\g<1>'+rndStr+r'\g<2>'+rndStr2, s) 
print outstr

Producing:

Looking at dates between 20101215 and 20101216


Notice if you change the value of rndStr or rndStr2 to text (like 'abc') rather than digits, you get something closer to the expected result?

In your expression to re.sub you have r'\1'+rndStr+... This combines into '\1'+'20101215' which then tries to reference the back reference of \120101215 which is probably not what you intended...

You can use named back references to make the back reference unambiguous:

rep1 = "20101215"
rep2 = "20101216"
st = "Looking at dates between 20110316 and 20110317"

print re.sub(r'(?P<fp>.+)[0-9]{8}(?P<lp>.+)[0-9]{8}',
            r'\g<fp>'+rep1+r'\g<lp>'+rep2,st)

Better still, use an easier to understand syntax and check the return of the attempted match:

m=re.search(r'(?P<fp>.+)[0-9]{8}(?P<lp>.+)[0-9]{8}',st)
if m:
    print m.group('fp')+rep1+m.group('lp')+rep2  #you could use m.group(1) too
else:
    print "no match..."

In either case, your desired string of Looking at dates between 20101215 and 20101216 is produced.

The Python docs on named backreferences:

(?P<name>...)

Similar to regular parentheses, but the substring matched by the group is accessible within the rest of the regular expression via the symbolic group name 'name'. Group names must be valid Python identifiers, and each group name must be defined only once within a regular expression. A symbolic group is also a numbered group, just as if the group were not named. So the group named 'id' in the example below can also be referenced as the numbered group 1.

For example, if the pattern is (?P<id>[a-zA-Z_]\w*), the group can be referenced by its name in arguments to methods of match objects, such as m.group('id') or m.end('id'), and also by name in the regular expression itself (using (?P=id)) and replacement text given to .sub() (using \g<id>).


rndStr = "20101215"
rndStr2 = "20101216"
mys = "Looking at dates between {0} and {1}".format(rndStr, rndStr2)

Please do not use str as a variable name; it overwrites the built-in str type.


rndStr = "20101215"
rndStr2 = "20101216"

print "Looking at dates between %s and %s" %(rndStr,rndStr2)
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜