How can I have two wildcards in this regex expression?
trying to get the following regex: <- bad english from me :(
I'm trying to get the following 开发者_JS百科input text converted as regex...
xx.*.aaa.bbb*
where * are wildcards .. as in .. they represent wildcards to me .. not regex syntax.
Any suggestions, please?
Update - example inputs.
- xx.zzzzzzzzz.aaa.bbb = match
- xx.eee.aaa.bbbzzzz = match
- xx.eee.aaa.bbb.zzzz = match
- xx.aaa.bbb = not a match
You misunderstood the concept of *
in Regular Expressions.
I think what you are looking for is:
xx\..*\.aaa\.bbb.*
The thing is:
- a
.
is not a real.
. It means any character, so if you want to match a.
you must escape it:\.
*
means that the character that preceeds it will be matched 0 or many times, so how to emulate the wildcard you are looking for? Using.*
. It will match any character 0 or many times.
If you want to match exactly the entire string, and not any substring that matches the pattern, you have to include ^
at the begining and $
at the end, so your regex will be:
^xx\..*\.aaa\.bbb.*$
Try this expression:
^xx\.[^\.]+\.aaa\.bbb.*
Assuming that you're saying that * is a wildcard in the 'normal sense', and that your string isn't an attempt at regex, I'd say that xx\..+\.aaa\.bbb.+
is what you're after.
What you refer to as "wildcard -- not regex syntax" is from globbing. It's a pattern matchnig technique that was popularized in the first Unix version in the late 60's. Originally it was a separate program -- called glob -- that produced a result that could be piped to other programs. Now bash, MS-Dos and almost any shell has this feature built-in. In globbing *
normally means match any character, any number of times.
The regex syntax is different. The .*
idiom in regex is similar to the *
in globbing, but not exactly the same. Normally, .*
doesn't match line-breaks. You usually have to set the single-line mode (in Ruby called multi line) if you want .*
to match any character, any number of times in regex.
*
are not wildcards, they mean the preceeding character is repeated 0 or 1 or many times.
And the dot can be any character.
UPDATE:
You can try this
^xx\.[a-z]+\.aaa\.bbb\.?[a-z]*
and you can test it for example here online on rubular
The [a-z]
are character groups, within you can define what character is allowed (or not allowed using [^a-z]
). so if you are only looking for lowercase letters then you can use [a-z]
.
The +
means it has to there at least once.
The \.?
near the end means there can be a dot or not
The ^
at the beginning means to match at the start of the string
A nice tutorial (for Perl, but at least the basics are the same nearly everywhere) is the PerlReTut
精彩评论