Regex too have certain letters and at least one of a certain group of letters
C开发者_如何学Goan someone help me with a regex statement for finding a statement using this rule?
The word needs to have the letters "J, U, G" (just the letters not the order) and at least one of these letters : G, L, E, R , S
So I can search a list for jugs, juggler, jugglers, juggles, etc.
Thanks
There is also a regex solution. But you should really give the language you are using because there can be other maybe better solutions for your task as @Quick Joe Smith wrote.
^(?=.*J)(?=.*U)(?=.*G)(?=.*[LERS]).*$
See on Rubular
Those (?=)
are positive look aheads, they check if there is the character in the string but they don't match them. The .*
at the end will then match your complete string.
You also need the modifier i
to turn on ignorecase (case insensitive)
The first part of your question does not lend itself to regular expressions very well at all. The pattern will end up a convoluted mess, and only get worse as you add more required characters.
The second part, however, is trivial:
m/[glers]/i
So I would suggest implementing a solution in two parts. This depends on your language:
C# (using Linq)
var chars = "GJU"; // characters are sorted.
if (inputstring.ToUpper().Intersect(chars).OrderBy(c => c).SequenceEqual(chars)) {
// do stuff if match.
}
Perl (requires 5.10)
my @chars = sort split '', 'GJU'; # Transform into sorted array.
my %input = map{($_, 1)} split '', uc $inputstring; # stores unique chars from string.
if (@chars ~~ %input) { # Smart match performs hash key intersection.
# Do stuff in here.
}
Python
chars = set('jug')
input = set(inputstring)
if chars == (chars & input):
# do something here
If you're working with one word at a time, try this:
boolean isMatch = s.matches(
"(?i)^(?:J()|U()|G(?!.*G)()|[GLERS]()|\\w){4,}+$\\1\\2\\3\\4");
If you're searching for matches in a longer string:
Pattern p = Pattern.compile(
"(?i)\\b(?:J()|U()|G(?!.*G)()|[GLERS]()|\\w){4,}+\\b\\1\\2\\3\\4");
Matcher m = p.matcher(s);
while (m.find()) {
String foundString = m.group();
}
Each time one of the first four alternatives - J()
, U()
, G()
or [GLERS]()
- matches something, the empty group following it "captures" nothing (i.e., an empty string). When the end of the string is reached, each of the backreferences - \1
, \2
, etc. - tries to match the same thing its corresponding group matched: nothing again.
Obviously, that will always succeed; you can always match noting. The trick is that the backreference won't even try to match if its corresponding group didn't participate in the match. That is, if there's no j
in the target string, the ()
in the J()
alternative never gets involved. When the regex engine processes the \1
backreference later, it immediately reports failure because it knows that group hasn't participated in the match.
In this way, the empty groups act like a check boxes, and the backreferences make sure all the boxes have been checked. There's one wrinkle, though. Both the G()
and [GLERS]()
alternatives can match g
; how do you make sure they both participate in the match when you need them to? The first regex I tried,
"(?i)^(?:J()|U()|G()|[GLERS]()|\\w){4,}+$\\1\\2\\3\\4"
...failed to match the word "jugg" because the G()
alternative was consuming both g
's; [GLERS]()
never got a chance to participate. So I added the negative lookahead - (?!.*G)
- and now it only matches the last g
. If I had three alternatives that could match a g
, I would have to add (?!.*G.*G)
to the first one and (?!.*G)
to the second. But realistically, I probably would have switched to a different approach (probably one not involving regexes) well before I reached that point. ;)
精彩评论