Why is there an extra result handed back to me during this Python regex example?
Code:
re.findall('(/\d\d\d\d)?','/2000')
Result:
['/2000', '']
Code:
re.findall('/\d\d\d\d?','/2000')
Result:
['/2000']
Why is the extra '' returned in the first example? i am using the first example for django url configuration , is there a way i can prevent matching of '' ? 开发者_如何学JAVA
Because using the brackets you define a group, and then with ?
you ask for 0 to 1 repetitions of the group. Thus the empty string and /2000
both match.
the operator ?
will match 0 or 1 repetitions of the preceding expression, in the first case the preceding expression is (/\d\d\d\d)
, while in the second is the last \d
.
Therefore the first case the empty string ""
will be matched, as it contain zero repetition of the expression (/\d\d\d\d)
Here is what is happening: The regex engine starts off with its pointer before the first char in the target string. It greedily consumes the whole string and places the match result in the first list element. This leaves the internal pointer at the end of the string. But since the regex pattern can match nothingness, it successfully matches at the position at the end of the string too, Thus, there are two elements in the list.
精彩评论