开发者

Python pattern-matching. Match 'c[any number of consecutive a's, b's, or c's or b's, c's, or a's etc.]t'

Sorry about the title, I couldn't come up with a clean way to ask my question.

In Python I would like to match an expression 'c[some stuff]t', where [some stuff] could be any number of cons开发者_如何学JAVAecutive a's, b's, or c's and in any order.

For example, these work: 'ct', 'cat', 'cbbt', 'caaabbct', 'cbbccaat'

but these don't: 'cbcbbaat', 'caaccbabbt'

Edit: a's, b's, and c's are just an example but I would really like to be able to extend this to more letters. I'm interested in regex and non-regex solutions.


Not thoroughly tested, but I think this should work:

import re

words = ['ct', 'cat', 'cbbt', 'caaabbct', 'cbbccaat',  'cbcbbaat', 'caaccbabbt']
pat = re.compile(r'^c(?:([abc])\1*(?!.*\1))*t$')
for w in words:
    print w, "matches" if pat.match(w) else "doesn't match"

#ct matches
#cat matches
#cbbt matches
#caaabbct matches
#cbbccaat matches
#cbcbbaat doesn't match
#caaccbabbt doesn't match

This matches runs of a, b or c (that's the ([abc])\1* part), while the negative lookahead (?!.*\1) makes sure no other instance of that character is present after the run.

(edit: fixed a typo in the explanation)


Not sure how attached you are to regex, but here is a solution using a different method:

from itertools import groupby

words = ['ct', 'cat', 'cbbt', 'caaabbct', 'cbbccaat',  'cbcbbaat', 'caaccbabbt']
for w in words:
    match = False
    if w.startswith('c') and w.endswith('t'):
        temp = w[1:-1]
        s = set(temp)
        match = s <= set('abc') and len(s) == len(list(groupby(temp)))
    print w, "matches" if match else "doesn't match"

The string matches if a set of the middle characters is a subset of set('abc') and the number of groups returned by groupby() is the same as the number of elements in the set.


I believe you need to explicitly encode all possible permutations of as, bs and cs:

c(a*b*c*|b*a*c*|b*c*a*|c*b*a*|c*a*b*|a*c*b*)t

Note that this is an extremely inefficient query which may backtrack a lot.


I don't know the Python regex engine, but it sounds like you just want to write out the 6 different possible orderings directly.

/c(a*b*c*|a*c*b*|b*a*c*|b*c*a*|c*a*b*|c*b*a*)t/


AFAIK there's no "compact" way of doing this...

c(a*(b*c*|c*b*)|b*(a*c*|c*a*)|c*(a*b*|b*a*))t
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜