Python pattern-matching. Match 'c[any number of consecutive a's, b's, or c's or b's, c's, or a's etc.]t'
Sorry about the title, I couldn't come up with a clean way to ask my question.
In Python I would like to match an expression 'c[some stuff]t', where [some stuff] could be any number of cons开发者_如何学JAVAecutive a's, b's, or c's and in any order.
For example, these work: 'ct', 'cat', 'cbbt', 'caaabbct', 'cbbccaat'
but these don't: 'cbcbbaat', 'caaccbabbt'
Edit: a's, b's, and c's are just an example but I would really like to be able to extend this to more letters. I'm interested in regex and non-regex solutions.
Not thoroughly tested, but I think this should work:
import re
words = ['ct', 'cat', 'cbbt', 'caaabbct', 'cbbccaat', 'cbcbbaat', 'caaccbabbt']
pat = re.compile(r'^c(?:([abc])\1*(?!.*\1))*t$')
for w in words:
print w, "matches" if pat.match(w) else "doesn't match"
#ct matches
#cat matches
#cbbt matches
#caaabbct matches
#cbbccaat matches
#cbcbbaat doesn't match
#caaccbabbt doesn't match
This matches runs of a
, b
or c
(that's the ([abc])\1*
part), while the negative lookahead (?!.*\1)
makes sure no other instance of that character is present after the run.
(edit: fixed a typo in the explanation)
Not sure how attached you are to regex, but here is a solution using a different method:
from itertools import groupby
words = ['ct', 'cat', 'cbbt', 'caaabbct', 'cbbccaat', 'cbcbbaat', 'caaccbabbt']
for w in words:
match = False
if w.startswith('c') and w.endswith('t'):
temp = w[1:-1]
s = set(temp)
match = s <= set('abc') and len(s) == len(list(groupby(temp)))
print w, "matches" if match else "doesn't match"
The string matches if a set of the middle characters is a subset of set('abc')
and the number of groups returned by groupby()
is the same as the number of elements in the set.
I believe you need to explicitly encode all possible permutations of a
s, b
s and c
s:
c(a*b*c*|b*a*c*|b*c*a*|c*b*a*|c*a*b*|a*c*b*)t
Note that this is an extremely inefficient query which may backtrack a lot.
I don't know the Python regex engine, but it sounds like you just want to write out the 6 different possible orderings directly.
/c(a*b*c*|a*c*b*|b*a*c*|b*c*a*|c*a*b*|c*b*a*)t/
AFAIK there's no "compact" way of doing this...
c(a*(b*c*|c*b*)|b*(a*c*|c*a*)|c*(a*b*|b*a*))t
精彩评论