开发者

python create re.compile instance error when regex contain escape character

when create re.compile, return different result when position of escape character

re.compile('[:<>"\]+') -> re.error: unterminated character set at position 0

re.compile('[\:<>"]+') -> re.compile('[\:<>"]+')

python version info : sys.version_info(major=3, minor=10, micro=8, releaselevel='final', serial=0)

i think these two code is completely same but return different result so can i catch the开发者_运维问答 reason of them?


From the re doc, the backslash is used to escape special characters both in python literals and characters that are used specially in the regular expression itself. You wanted to put a backslash in the character set, but you ended up escaping the terminating ] character making the regex invalid. By moving the backslash, you avoided both the python and the regular expression special characters.


The backslash character is always an escape character in re patterns, even in positions it wouldn't be causing an escape in a normal Python string.

If it is followed by a character that doesn't have special meaning, that character just acts normally, so in the second example given re.compile('[\:<>"]+') -> re.compile('[\:<>"]+'), this pattern will not match on the backslash character, but will still match on a :. This can be tested at pythex.org

To pattern match on a backslash character you need to escape the backslash character not only within the Python string, but then again for the re parser.

This means that if you are using a standard string, then you must use 4 * '\' re.compile('[:<>"\\\\]+')

If you use a raw string, then you need 2 * '\' re.compile(r'[:<>"\\]+')

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜