Saving substrings using Regular Expressions
I'm new to regular expressi开发者_Python百科ons in Java (or any language, for that matter) and I'm wanting to do a find using them. The tricky part that I don't understand how to do is replace something inside the string that matches.
For example, if the line I'm looking for is
Person item6 [can {item thing [wrap]}]
I'm able to write a regex that finds that line, but finding what the word "thing" is (as it may differ among different lines) is my problem. I may want to either replace that word with something else or save it in a variable for later. Is there any easy way to do this using Java's regex engine?
Yes. You wrap it in "capturing groups", which is just some ( ) around the part of the regular expression matching the interesting word.
Here is an example:
public static void main(String[] args) {
Pattern pat = Pattern.compile("testing (\\d+) widgets");
String text = "testing 5 widgets";
Matcher matcher = pat.matcher(text);
if (matcher.matches()) {
System.out.println("Widgets tested : " + matcher.group(1));
} else {
System.out.println("No match");
}
}
Pattern and Matcher come from java.util.regex. There are some shortcuts in the String class, but these are the most flexible
The problem specification isn't very clear, but here are some ideas that may work:
Use lookarounds and replaceAll/First
The following regex matches the \w+
that is preceded by the string "{item "
and followed by the string " ["
. Lookarounds are used to match exactly the \w+
only. Metacharacters {
and [
are escaped as necessary.
String text =
"Person item6 [can {item thing [wrap]}]\n" +
"Cat item7 [meow meow {item thang [purr]}]\n" +
"Dog item8 [maybe perhaps {itemmmm thong [woof]}]" ;
String LOOKAROUND_REGEX = "(?<=\\{item )\\w+(?= \\[)";
System.out.println(
text.replaceAll(LOOKAROUND_REGEX, "STUFF")
);
This prints:
Person item6 [can {item STUFF [wrap]}]
Cat item7 [meow meow {item STUFF [purr]}]
Dog item8 [maybe perhaps {itemmmm thong [woof]}]
References
- regular-expressions.info/Lookarounds
String.replaceAll(String regex, String replacement)
Use capturing groups instead of lookarounds
Lookarounds should be used judiciously. Lookbehinds in particular in Java is very limited. A more commonly applied technique is to use capturing groups to match more than just the interesting parts.
The following regex matches a similar pattern from before, \w+
, but also includes the "{item "
prefix and " ["
suffix. Additionally, the m
in item
can repeat without limitation (something that can't be matched in a lookbehind in Java).
String CAPTURING_REGEX = "(\\{item+ )(\\w+)( \\[)";
System.out.println(
text.replaceAll(CAPTURING_REGEX, "$1STUFF$3")
);
This prints:
Person item6 [can {item STUFF [wrap]}]
Cat item7 [meow meow {item STUFF [purr]}]
Dog item8 [maybe perhaps {itemmmm STUFF [woof]}]
Our pattern has 3 capturing groups:
(\{item+ )(\w+)( \[)
\________/\___/\___/
group 1 2 3
Note that we can't simply replace what we matched with "STUFF"
, because we match some "extraneous" parts. We're not interested in replacing them, so we capture these parts and just put them back in the replacement string. The way we refer to what a group captured in replacement strings in Java is to use the $
sigil; thus the $1
and $3
in the above example.
References
- regular-expressions.info/Grouping
Use a Matcher
for more flexibility
Not everything can be done with replacement strings. Java doesn't have postprocessing to capitalize a captured string, for example. In these more general replacement scenarios, you can use a Matcher
loop like the following:
Matcher m = Pattern.compile(CAPTURING_REGEX).matcher(text);
StringBuffer sb = new StringBuffer();
while (m.find()) {
System.out.println("Match found");
for (int i = 0; i <= m.groupCount(); i++) {
System.out.printf("Group %d captured <%s>%n", i, m.group(i));
}
m.appendReplacement(sb,
String.format("%s%s %<s and more %<SS%s",
m.group(1), m.group(2), m.group(3)
)
);
}
m.appendTail(sb);
System.out.println(sb.toString());
The above prints:
Match found
Group 0 captured <{item thing [>
Group 1 captured <{item >
Group 2 captured <thing>
Group 3 captured < [>
Match found
Group 0 captured <{item thang [>
Group 1 captured <{item >
Group 2 captured <thang>
Group 3 captured < [>
Match found
Group 0 captured <{itemmmm thong [>
Group 1 captured <{itemmmm >
Group 2 captured <thong>
Group 3 captured < [>
Person item6 [can {item thing thing and more THINGS [wrap]}]
Cat item7 [meow meow {item thang thang and more THANGS [purr]}]
Dog item8 [maybe perhaps {itemmmm thong thong and more THONGS [woof]}]
References
java.util.regex.Pattern
java.util.regex.Matcher
group(int)
- access individual captured stringsappendReplacement
-- unfortunately,StringBuffer
-only
java.util.Formatter
- used inprintf
andString.format
in above example
Attachments
- Source code of above example in ideone.com
精彩评论