开发者

How to match patterns "begins with A or ends with B" with Python regular expression?

r'(^|^A)(\S+)(B$|$)'

results to matches everything, which actually equals to ^\S$.

How to write one matches "begins with A or ends with B, may both but not neither?"

PS: I also need r开发者_运维百科efer to group (\S+) in the substring module.

Example:

Match Aanything, anythingB, and refer anything group in the replace.


(^A.*B$)|(^A.*$)|(^.*B$)


Is this the desired behavior?

var rx = /^((?:A)?)(.*?)((?:B)?)$/;
"Aanything".match(rx)
> ["Aanything", "A", "anything", ""]
"anythingB".match(rx)
> ["anythingB", "", "anything", "B"]
"AanythingB".match(rx)
> ["AanythingB", "A", "anything", "B"]
"anything".match(rx)
> ["anything", "", "anything", ""]
"AanythingB".replace(rx, '$1nothing$3');
> "AnothingB"
"AanythingB".replace(rx, '$2');
> "anything"


^A|B$ or ^A|.*B$ (depending whether the match function is matching from the beginning)

UPDATE

it's difficult to write single regexp for this..

a possibility is:

match = re.match(r'^(?:A(\S+))|(?:(\S+)B)$', string)
if match:
    capture = max(match.groups())
# because match.groups() is either (capture, None) or (None, capture)


try this:

/(^A|B$)/


Problem solved.

I use this regex in python, I found this in the Python manual:

(?(id/name)yes-pattern|no-pattern) Will try to match with yes-pattern if the group with given id or name exists, and with no-pattern if it doesn’t. no-pattern is optional and can be omitted. For example, (<)?(\w+@\w+(?:.\w+)+)(?(1)>) is a poor email matching pattern, which will match with '' as well as 'user@host.com', but not with '

New in version 2.4.

So my final answer is:

r'(?P<prefix>A)?(?P<key>\S+)(?(prefix)|B)'

Commands:

>>>re.sub(r'(?P<prefix>A)?(?P<key>\S+)(?(prefix)|B)','\g<key>',"Aanything")

'anything'

>>>re.sub(r'(?P<prefix>A)?(?P<key>\S+)(?(prefix)|B)','\g<key>',"anythingB")

'anything'

While AanythingB give me anythingB back, but I don't care anyway.

>>>re.sub(r'(?P<prefix>A)?(?P<key>\S+)(?(prefix)|B)','\g<key>',"AanythingB")

'anythingB'


If you don't mind the extra weight in the case where both prefix "A" and suffix "B" exist, you can use a shorter regex:

reMatcher= re.compile(r"(?<=\AA).*|.*(?=B\Z)")

(using \A for ^ and \Z for $)

This one keeps the "A" prefix (instead of the "B" prefix of your solution) when both "A" and "B" are at their respective corners:

'A text here' matches ' text here'
'more text hereB' matches 'more text here'
'AYES!B' matched 'AYES!'
'neither' doesn't match

Otherwise, a non-regex solution (some would say a more “Pythonic” one) is:

def strip_prefix_suffix(text, prefix, suffix):
    left =  len(prefix) if text.startswith(prefix) else 0
    right= -len(suffix) if text.endswith(suffix) else None
    return text[left:right] if left or right else None

If there is no match, the function returns None to differentiate from a possible '' (e.g. when called as strip_prefix_suffix('AB', 'A', 'B')).

PS I should also say that this regex:

(?<=\AA).*(?=B\Z)|(?<=\AA).*|.*(?=B\Z)

should work, but it doesn't; it works just like the one I suggested, and I can't understand why. Breaking down the regex into parts, we can see something weird:

>>> text= 'AYES!B'
>>> re.compile('(?<=\\AA).*(?=B\\Z)').search(text).group(0)
'YES!'
>>> re.compile('(?<=\\AA).*').search(text).group(0)
'YES!B'
>>> re.compile('.*(?=B\\Z)').search(text).group(0)
'AYES!'
>>> re.compile('(?<=\\AA).*(?=B\\Z)|(?<=\\AA).*').search(text).group(0)
'YES!'
>>> re.compile('(?<=\\AA).*(?=B\\Z)|.*(?=B\\Z)').search(text).group(0)
'AYES!'
>>> re.compile('(?<=\\AA).*|.*(?=B\\Z)').search(text).group(0)
'AYES!'
>>> re.compile('(?<=\\AA).*(?=B\\Z)|(?<=\\AA).*|.*(?=B\\Z)').search(text).group(0)
'AYES!'

For some strange reason, the .*(?=B\\Z) subexpression takes precedence, even though it's the last alternative.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜