How to match repeated patterns?
I would like to match:
some.name.separa开发者_如何学编程ted.by.dots
But I don't have any idea how.
I can match a single part like this
\w+\.
How can I say "repeat that"
Try the following:
\w+(?:\.\w+)+
The +
after (?: ... )
tell it to match what is inside the parenthesis one or more times.
Note that \w
only matches ASCII characters, so a word like café
wouldn't be matches by \w+
, let alone words/text containing Unicode.
EDIT
The difference between [...]
and (?:...)
is that [...]
always matches a single character. It is called a "character set" or "character class". So, [abc]
does not match the string "abc"
, but matches one of the characters a
, b
or c
.
The fact that \w+[\.\w+]*
also matches your string is because [\.\w+]
matches a .
or a character from \w
, which is then repeated zero or more time by the *
after it. But, \w+[\.\w+]*
will therefor also match strings like aaaaa
or aaa...........
.
The (?:...)
is, as I already mentioned, simply used to group characters (and possible repeat those groups).
More info on character sets: http://www.regular-expressions.info/charclass.html
More info on groups: http://www.regular-expressions.info/brackets.html
EDIT II
Here's an example in Java (seeing you post mostly Java answers):
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Main {
public static void main(String[] args) {
String text = "some.text.here only but not Some other " +
"there some.name.separated.by.dots and.we are done!";
Pattern p = Pattern.compile("\\w+(?:\\.\\w+)+");
Matcher m = p.matcher(text);
while(m.find()) {
System.out.println(m.group());
}
}
}
which will produce:
some.text.here
some.name.separated.by.dots
and.we
Note that m.group(0)
and m.group()
are equivalent: meaning "the entire match".
This will also work:
(\w+(\.|$))+
You can use ?
to match 0 or 1 of the preceeding parts, *
to match 0 to any amount of the preceeding parts, and +
to match at least one of the preceeding parts.
So (\w\.)?
will match w. and a blank, (\w\.)*
will match r.2.5.3.1.s.r.g.s. and a blank, and (\w\.)+
will match any of the above but not a blank.
If you want to match something like your example, you'll need to do (\w+\.)+
, which means 'match at least one non whitespace, then a period, and match at least one of these'.
(\w+\.)+
Apparently, the body has to be at least 30 characters. I hope this is enough.
精彩评论