two question about python regular expressions

2023-03-28 19:58 问答作者：

Q1. why we can not use word boundary and back reference without using r at start of regex? e.g. '\b[a-z]{5}\d{3}\b' this not works but this r'\b[a-z]{5}\d{3}\b' works

Q2. why python does not sup开发者_如何转开发port variable length negative look behind assertions while it supports variable length negative look ahead assertion, c# support both and i think it is an excellent feature to have also variable length negative look behind in python.

please clear these two concepts. thanks

It does work without raw strings:

'\\b[a-z]{5}\\d{3}\\b'

You just need to double escape the special chars (actually, what you do is escape all backslashes).

Variable length assertions are one of those features that some implementations support and some don't. Check out the regex module on PyPI for a version with more features and better unicode support, which may eventually replace the standard library re.

Edit: To make the version from your comment work without raw strings, use:

re.sub('[a-z]+(\d+)', '\\1', string)

Again, Python interprets backslashes. it thinks \1 means a byte value of 1. If you actually mean \1, you need to escape the backslash by doing \\1, or use raw strings.

Edit 2: Adding the link from @Nate's comment to the list of Python escape sequences.

In regards to your first question, this is because the r designates a "raw string". Without this r, your backslashes are interpreted as escape codes. If you don't want to use raw strings, you can use '\\b[a-z]{5}\\d{3}\\b', although this is far less readable. You can read more detail about raw strings here.

In regards to your second question, you should take a look at this excellent question, which discusses the differences between various flavors of regular expression used by different languages (namely C#, Java, and Python).

Almost all information you can find in tutorial - which is your the best friend:

Regular expressions use the backslash character ('\') to indicate special forms or to allow special characters to be used without invoking their special meaning. This collides with Python’s usage of the same character for the same purpose in string literals; for example, to match a literal backslash, one might have to write '\\' as the pattern string, because the regular expression must be \, and each backslash must be expressed as \ inside a regular Python string literal.

The solution is to use Python’s raw string notation for regular expression patterns; backslashes are not handled in any special way in a string literal prefixed with 'r'. So r"\n" is a two-character string containing '\' and 'n', while "\n" is a one-character string containing a newline. Usually patterns will be expressed in Python code using this raw string notation.

It is quite hard to answer on your second question - i think that authors are implementing only those features which they think are required. They try to add code that is useful for the most of user but it is impossible to implement all the features fast.

继续阅读：python regex

two question about python regular expressions

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？