How to sub with matched groups and variables in Python
new to python. This is probably simple but I haven't found an answer.
rndStr = "20101215"
rndStr2 = "20101216"
str = "Looking at dates between 20110316 and 20110317"
outstr = re.sub("(.+)([0-9]{8})(.+)([0-9]{8})",r'\1'+rndStr+r'\2'+rndStr2,str)
The output I'm looking for is:
Looking at dates between 20101215 and 20101216
But instead I get:
P101215101216
The values of the two rndStr's doesn't really matter. Assume its random or taken from user input (I put static vals here to keep it simple). Thanks f开发者_如何学JAVAor any help.
Your backreferences are ambiguous. Your replacement string becomes
\120101215\220101216
which is two rather large numbers to be backreferencing :)
To solve it, use this syntax:
r'\g<1>'+rndStr+r'\g<2>'+rndStr2
You also have too many sets of parentheses (or "brackets" if you speak British English like me:) - you don't need parentheses around the [0-9]{8}
parts which you're not backreferencing:
re.sub("(.+)[0-9]{8}(.+)[0-9]{8}",...
should be sufficient.
(And, as noted elsewhere, don't use str
as a variable name. Unless you want to spend ages debugging why str.replace()
doesn't work anymore. Not that I ever did that once... noooo. :)
so the whole thing becomes:
import re
rndStr = "20101215"
rndStr2 = "20101216"
s = "Looking at dates between 20110316 and 20110317"
outstr = re.sub("(.+)[0-9]{8}(.+)[0-9]{8}", r'\g<1>'+rndStr+r'\g<2>'+rndStr2, s)
print outstr
Producing:
Looking at dates between 20101215 and 20101216
Notice if you change the value of rndStr
or rndStr2
to text (like 'abc') rather than digits, you get something closer to the expected result?
In your expression to re.sub
you have r'\1'+rndStr+...
This combines into '\1'+'20101215'
which then tries to reference the back reference of \120101215
which is probably not what you intended...
You can use named back references to make the back reference unambiguous:
rep1 = "20101215"
rep2 = "20101216"
st = "Looking at dates between 20110316 and 20110317"
print re.sub(r'(?P<fp>.+)[0-9]{8}(?P<lp>.+)[0-9]{8}',
r'\g<fp>'+rep1+r'\g<lp>'+rep2,st)
Better still, use an easier to understand syntax and check the return of the attempted match:
m=re.search(r'(?P<fp>.+)[0-9]{8}(?P<lp>.+)[0-9]{8}',st)
if m:
print m.group('fp')+rep1+m.group('lp')+rep2 #you could use m.group(1) too
else:
print "no match..."
In either case, your desired string of Looking at dates between 20101215 and 20101216
is produced.
The Python docs on named backreferences:
(?P<name>...)
Similar to regular parentheses, but the substring matched by the group is accessible within the rest of the regular expression via the symbolic group name 'name'. Group names must be valid Python identifiers, and each group name must be defined only once within a regular expression. A symbolic group is also a numbered group, just as if the group were not named. So the group named
'id'
in the example below can also be referenced as the numbered group 1.For example, if the pattern is
(?P<id>[a-zA-Z_]\w*)
, the group can be referenced by its name in arguments to methods of match objects, such asm.group('id')
orm.end('id')
, and also by name in the regular expression itself (using(?P=id)
) and replacement text given to.sub()
(using\g<id>
).
rndStr = "20101215"
rndStr2 = "20101216"
mys = "Looking at dates between {0} and {1}".format(rndStr, rndStr2)
Please do not use str
as a variable name; it overwrites the built-in str
type.
rndStr = "20101215"
rndStr2 = "20101216"
print "Looking at dates between %s and %s" %(rndStr,rndStr2)
精彩评论