开发者

Python regular expression inconsistency

I am getting different results based on whether I precompile a regular expression:

>>> re.compile('mr', re.IGNORECASE).sub('', 'Mr Bean')
' Bean'
>>> re.sub(开发者_JS百科'mr', '', 'Mr Bean', re.IGNORECASE)
'Mr Bean'

The Python documentation says Some of the functions are simplified versions of the full featured methods for compiled regular expressions. However it also claims RegexObject.sub() is Identical to the sub() function.

So what is going on here?


re.sub() can't accept the re.IGNORECASE, it appears.

The documentation states:

sub(pattern, repl, string, count=0)

Return the string obtained by replacing the leftmost
non-overlapping occurrences of the pattern in string by the
replacement repl.  repl can be either a string or a callable;
if a string, backslash escapes in it are processed.  If it is
a callable, it's passed the match object and must return
a replacement string to be used.

Using this works in its place, however:

re.sub("(?i)mr", "", "Mr Bean")


the module level sub() call doesn't accept modifiers at the end. thats the "count" argument - the maximum number of pattern occurrences to be replaced.


>>> help(re.sub)
  1 Help on function sub in module re:
  2 
  3 sub(pattern, repl, string, count=0)
  4     Return the string obtained by replacing the leftmost
  5     non-overlapping occurrences of the pattern in string by the
  6     replacement repl.  repl can be either a string or a callable;
  7     if a callable, it's passed the match object and must return
  8     a replacement string to be used.

There is no function parameter in re.sub for regex flags (IGNORECASE, MULTILINE, DOTALL) as in re.compile.

Alternatives:

>>> re.sub("[M|m]r", "", "Mr Bean")
' Bean'

>>> re.sub("(?i)mr", "", "Mr Bean")
' Bean'

Edit Python 3.1 added support for regex flags, http://docs.python.org/3.1/whatsnew/3.1.html. As of 3.1 the signature of e.g. re.sub looks like:

re.sub(pattern, repl, string[, count, flags])


From the Python 2.6.4 documentation:

re.sub(pattern, repl, string[, count])

re.sub() doesn't take a flag to set the regex mode. If you want re.IGNORECASE, you must use re.compile().sub()

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜