开发者

Regexp grouping and replaceAll with .* in Java duplicates the replacement

I got a problem us开发者_Go百科ing Rexexp in Java. The example code writes out ABC_012_suffix_suffix, I was expecting it to output ABC_012_suffix

    Pattern rexexp  = Pattern.compile("(.*)");
    Matcher matcher = rexexp.matcher("ABC_012");
    String  result  = matcher.replaceAll("$1_suffix");

    System.out.println(result);

I understand that replaceAll replaces all matched groups, the questions is why is this regexp group (.*) matching twice on my string ABC_012 in Java?


Pattern regexp  = Pattern.compile(".*");
Matcher matcher = regexp.matcher("ABC_012");
matcher.matches();
System.out.println(matcher.group(0));
System.out.println(matcher.replaceAll("$0_suffix"));

Same happens here, the output is:

ABC_012
ABC_012_suffix_suffix

The reason is hidden in the replaceAll method: it tries to find all subsequences that match the pattern:

while (matcher.find()) {
  System.out.printf("Start: %s, End: %s%n", matcher.start(), matcher.end());
}

This will result in:

Start: 0, End: 7
Start: 7, End: 7

So, to our first surprise, the matcher finds two subsequences, "ABC_012" and another "". And it appends "_suffix" to both of them:

"ABC_012" + "_suffix" + "" + "_suffix"


Probably .* gives you "full match" and then reduces match to the "empty match" (but still a match). Try (.+) or (^.*$) instead. Both work as expected.

At regexinfo star is defined as follows:

*(star) - Repeats the previous item zero or more times. Greedy, so as many items as possible will be matched before trying permutations with less matches of the preceding item, up to the point where the preceding item is not matched at all.


If you just want to add "_suffix" to your input why don't you just do:

String result = "ABC_012" + "_suffix";

?

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜