[python]: problem about python string literals
code goes below:
line = r'abc\def\n'
rline = re.sub('\\\\', '+', line) #开发者_StackOverflow社区 then rline should be r'abc+def+n'
Apparently, I just want to replace the backslashes in line with '+'. What I thought was that a backslash in line can be expressed as '\', then why should I use '\\' to get the re.sub work right.
I'm confused.
It's a good habit to always use raw strings when dealing with regex patterns:
In [45]: re.sub(r'\\', r'+', line)
Out[45]: 'abc+def+n'
To answer your question though, Python interprets '\\\\'
as two backslash characters:
In [44]: list('\\\\')
Out[44]: ['\\', '\\']
And the rules of regex interpret two backslash characters as one literal backslash.
Because there are two levels of backslashing:
- re.sub uses \ as an escape
- Python uses \ as an escape (unless you do r'...')
So \\\\
(python) -> \\
(re.sub) -> \
EDIT
And the SO level of backslashing! (it got me!)
If you want to search for a literal pattern, not an actual regular expression, you should use both raw strings and re.escape()
to avoid doubling backslashes or any other manual escaping completely.
So, your example would become:
line = r'abc\def\n'
backslash = re.escape(r'\')
rline = re.sub(backslash, '+', line)
精彩评论