开发者

[python]: problem about python string literals

code goes below:

line = r'abc\def\n'
rline = re.sub('\\\\', '+', line) #开发者_StackOverflow社区 then rline should be r'abc+def+n'

Apparently, I just want to replace the backslashes in line with '+'. What I thought was that a backslash in line can be expressed as '\', then why should I use '\\' to get the re.sub work right.

I'm confused.


It's a good habit to always use raw strings when dealing with regex patterns:

In [45]: re.sub(r'\\', r'+', line)
Out[45]: 'abc+def+n'

To answer your question though, Python interprets '\\\\' as two backslash characters:

In [44]: list('\\\\')
Out[44]: ['\\', '\\']

And the rules of regex interpret two backslash characters as one literal backslash.


Because there are two levels of backslashing:

  1. re.sub uses \ as an escape
  2. Python uses \ as an escape (unless you do r'...')

So \\\\ (python) -> \\ (re.sub) -> \

EDIT

And the SO level of backslashing! (it got me!)


If you want to search for a literal pattern, not an actual regular expression, you should use both raw strings and re.escape() to avoid doubling backslashes or any other manual escaping completely.

So, your example would become:

line = r'abc\def\n'
backslash = re.escape(r'\')
rline = re.sub(backslash, '+', line)
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜