开发者

Java Matcher and Pattern: Why does this go on forever

//remove multiple with
       pat=Pattern.compile("ACCEPT .*?\\.",Pattern.DOTALL);
       m=pat.matcher(str);       
       while(m.find())
       {

          int start=m.group().indexOf("WITH") +1;
  开发者_如何学JAVA        String part=m.group().substring(start);
          part=part.replaceAll("WITH", "");
          part=m.group().substring(0, start).concat(part);

          if(!m.group().equals(part))
          {

              str=m.replaceFirst(part);

          }

       }

Any idea why this is an infinite loop? m.group is never equal to part. I don't know why. Tried reset but nothing.


I have no idea what you are trying to accomplish, but there is a bug here:

if(!m.group().equals(part))
{
    str=m.replaceFirst(part);
}

You are reassigning str, while the matcher still works on the original value of str. Strings are immutable, if you reassign the variable in one place, it doesn't change the reference in another (see Passing Reference Data Type Arguments on this page of the Sun java Tutorial).

There are some more strange things going on, but perhaps I'm not understanding you correctly. You say in a comment that the string starts with ACCEPT and ends with . dot. But that is the only thing you are searching for Pattern.compile("ACCEPT .*?\\.",Pattern.DOTALL);, and you are not capturing anything either. Then why bother searching in the first place? I thought you knew that the input Strings are like that.

What you should really do is post some sample input and what data you want to extract from it. Otherwise no one will be able to really help you.


I am guessing now: you seem to want to remove multiple WITH clauses from your String. This should be much easier, something like this:

String test =
    "ACCEPT pasta "
       + "WITH tomatoes, parmesan cheese, olives "
       + "WITH anchovies WITH tuna WITH more olives.";

System.out.println(
    test.replaceAll(
        "(ACCEPT.*?WITH.*?)(?:\\s*WITH.*)(\\.)", "$1$2"
    )
);

Output:

ACCEPT pasta WITH tomatoes, parmesan cheese, olives.

Here's the Pattern, explained:

(       // start a capturing group
ACCEPT  // search for the literal ACCEPT
.*?     // search for the shortest possible matching String
        // (so no other WITH can sneak in)
WITH    // search for the literal WITH
.*?     // search for the shortest possible matching String
        // (so no other WITH can sneak in)
)       // close the capturing group, we'll refer to this
        // group as $1 or matcher.group(1)
(?:     // start a non-capturing group
\\s*    // search for optional whitespace
WITH    // search for the literal WITH
.*      // search for anything, greedily
)       // close the group, we'll discard this one
(       // open another capturing group
\\.     // search for a single period
)       // close the group, the period is now accessible as $2

Given your updated requirements (remove the WITHs but keep the args) here's an updated solution:

final Matcher matcher =
    Pattern.compile("WITH\\s*", Pattern.DOTALL).matcher(test);
final StringBuffer sb = new StringBuffer();
while(matcher.find()){
    matcher.appendReplacement(sb, sb.length() == 0
        ? matcher.group()
        : "");
}
matcher.appendTail(sb);
System.out.println(sb.toString());

Output:

ACCEPT pasta WITH tomatoes, parmesan cheese, olives anchovies tuna more olives.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜