Java split is eating my characters
I have a string like this String str = "la$le\\$li$lo"
.
I want to split it to get the following output "la","le\\$li","lo"
. The \$ is a $ escaped so it should be left in the output.
But when I do str.split("[^\\\\]\\$")
y get "l","le\\$l","lo"
.
From what 开发者_StackOverflow社区I get my regex is matching a$ and i$ and removing then. Any idea of how to get my characters back?
Thanks
Use zero-width matching assertions:
String str = "la$le\\$li$lo";
System.out.println(java.util.Arrays.toString(
str.split("(?<!\\\\)\\$")
)); // prints "[la, le\$li, lo]"
The regex is essentially
(?<!\\)\$
It uses negative lookbehind to assert that there is not a preceding \
.
See also
- regular-expressions.info/Lookarounds
More examples of splitting on assertions
Simple sentence splitting, keeping punctuation marks:
String str = "Really?Wow!This.Is.Awesome!";
System.out.println(java.util.Arrays.toString(
str.split("(?<=[.!?])")
)); // prints "[Really?, Wow!, This., Is., Awesome!]"
Splitting a long string into fixed-length parts, using \G
String str = "012345678901234567890";
System.out.println(java.util.Arrays.toString(
str.split("(?<=\\G.{4})")
)); // prints "[0123, 4567, 8901, 2345, 6789, 0]"
Using a lookbehind/lookahead combo:
String str = "HelloThereHowAreYou";
System.out.println(java.util.Arrays.toString(
str.split("(?<=[a-z])(?=[A-Z])")
)); // prints "[Hello, There, How, Are, You]"
Related questions
- Can you use zero-width matching regex in String split?
- Backreferences in lookbehind
- How do I convert CamelCase into human-readable names in Java?
The reason a$ and i$ are getting removed is that the regexp [^\\]\$
matches any character that is not '\' followed by '$'. You need to use zero width assertions
This is the same problem people have trying to find q not followed by u.
A first cut at the proper regexp is /(?<!\\)\$/
( "(?<!\\\\)\\$"
in java )
class Test {
public static void main(String[] args) {
String regexp = "(?<!\\\\)\\$";
System.out.println( java.util.Arrays.toString( "1a$1e\\$li$lo".split(regexp) ) );
}
}
Yields:
[1a, 1e\$li, lo]
You can try first replacing "\$" with another string, such as the URL Encoding for $ ("%24"), and then splitting:
String splits[] = str.replace("\$","%24").split("[^\\\\]\\$");
for(String str : splits){
str = str.replace("%24","\$");
}
More generally, if str is constructed by something like
str = a + "$" + b + "$" + c
Then you can URLEncode a, b and c before appending them together
import java.net.URLEncoder.encode;
...
str = encode(a) + "$" + encode(b) + "$" + encode(c)
import java.util.regex.*;
public class Test {
public static void main(String... args) {
String str = "la$le\\$li$lo";
Pattern p = Pattern.compile("(.+?)([^\\\\]\\$)");
Matcher m = p.matcher(str);
while (m.find()) {
System.out.println(m.group(1));
System.out.println(m.group(2));
}
}
}
gives
l
a$
le\$l
i$
精彩评论