Python regular expression inconsistency
I am getting different results based on whether I precompile a regular expression:
>>> re.compile('mr', re.IGNORECASE).sub('', 'Mr Bean')
' Bean'
>>> re.sub(开发者_JS百科'mr', '', 'Mr Bean', re.IGNORECASE)
'Mr Bean'
The Python documentation says Some of the functions are simplified versions of the full featured methods for compiled regular expressions. However it also claims RegexObject.sub() is Identical to the sub() function.
So what is going on here?
re.sub()
can't accept the re.IGNORECASE
, it appears.
The documentation states:
sub(pattern, repl, string, count=0)
Return the string obtained by replacing the leftmost non-overlapping occurrences of the pattern in string by the replacement repl. repl can be either a string or a callable; if a string, backslash escapes in it are processed. If it is a callable, it's passed the match object and must return a replacement string to be used.
Using this works in its place, however:
re.sub("(?i)mr", "", "Mr Bean")
the module level sub() call doesn't accept modifiers at the end. thats the "count" argument - the maximum number of pattern occurrences to be replaced.
>>> help(re.sub)
1 Help on function sub in module re:
2
3 sub(pattern, repl, string, count=0)
4 Return the string obtained by replacing the leftmost
5 non-overlapping occurrences of the pattern in string by the
6 replacement repl. repl can be either a string or a callable;
7 if a callable, it's passed the match object and must return
8 a replacement string to be used.
There is no function parameter in re.sub
for regex flags (IGNORECASE, MULTILINE, DOTALL
) as in re.compile
.
Alternatives:
>>> re.sub("[M|m]r", "", "Mr Bean")
' Bean'
>>> re.sub("(?i)mr", "", "Mr Bean")
' Bean'
Edit Python 3.1 added support for regex flags, http://docs.python.org/3.1/whatsnew/3.1.html. As of 3.1 the signature of e.g. re.sub
looks like:
re.sub(pattern, repl, string[, count, flags])
From the Python 2.6.4 documentation:
re.sub(pattern, repl, string[, count])
re.sub() doesn't take a flag to set the regex mode. If you want re.IGNORECASE, you must use re.compile().sub()
精彩评论