开发者

Java replaceAll() & split() irregularities

I know, I know, now I have two problems 'n all that, but regex here means I don't have to write two complicated loops. Instead, I have a regex that only I understand, and I'll be employed for yonks.

I have a string, say stack.overflow.questions[0].answer[1].postDate, and I need to get the [0] and the [1], preferably in an array. "Easy!" my neurons exclaimed, just use regex and the split method on your input string; so I came up with this:

String[] tokens = input.split("[^\\[\\d\\]]");

which produced the following:

[, , , , , , , , , , , , , , , , [0], , , , , , , [1]]

Oh dear. So, I thought, "what would replaceAll do in this instance?":

String onlyArrayIndexes = input.replaceAll("[^\\[\\d\\]]", "");

which pr开发者_JS百科oduced:

[0][1]

Hmm. Why so? I'm looking for a two-element string array that contains "[0]" as the first element and "[1]" as the second. Why does split not work here, when the Javadocs declare they both use the Pattern class as per the Javadoc?

To summarise, I have two questions: why does the split() call produce that large array with seemingly random space characters and am I right in thinking the replaceAll works because the regex replaces all characters not matching "[", a number and "]"? What am I missing that means I expect them to produce similar output (OK that's three, and please don't answer "a clue?" to this one!).


well from what I can see the split does work, it gives you an array that holds the string split for each match that is not a set of brackets with a digit in the middle.

as for the replaceAll I think your assumption is right. it removes everything (replace the match with "") that is not what you want.

From the API documentation:

Splits this string around matches of the given regular expression.

This method works as if by invoking the two-argument split method with the given expression and a limit argument of zero. Trailing empty strings are therefore not included in the resulting array.

The string "boo:and:foo", for example, yields the following results with these expressions:

Regex     Result
:     { "boo", "and", "foo" }
o     { "b", "", ":and:f" }


This is not a direct answer to your question, however I want to show you a great API that will suit your need.

Check out Splitter from Google Guava.

So for your example, you would use it like this:

Iterable<String> tokens = Splitter.onPattern("[^\\[\\d\\]]").omitEmptyStrings().trimResults().split(input);

//Now you get back an Iterable which you can iterate over. Much better than an Array.
for(String s : tokens) {
   System.out.println(s);
}

This prints:
0
1


split splits on boundaries defined by the regex you provide, so it's no great surprise you're getting lots of entries — nearly all of the characters in the string match your regex and so, by definition, are boundaries on which a split should occur.

replaceAll replaces matches for your regex with the replacement you give it, which in your case is a blank string.

If you're trying to grab the 0 and the 1, it's a trivial loop:

String text = "stack.overflow.questions[0].answer[1].postDate";
Pattern pat = Pattern.compile("\\[(\\d+)\\]");
Matcher m = pat.matcher(text);
List<String> results = new ArrayList<String>();
while (m.find()) {
    results.add(m.group(1)); // Or just .group() if you want the [] as well
}
String[] tokens = results.toArray(new String[0]);

Or if it's always exactly two of them:

String text = "stack.overflow.questions[0].answer[1].postDate";
Pattern pat = Pattern.compile(".*\\[(\\d+)\\].*\\[(\\d+)\\].*");
Matcher m = pat.matcher(text);
m.find();
String[] tokens = new String[2];
tokens[0] = m.group(1);
tokens[1] = m.group(2);


The problem is that split is the wrong operation here.

In ruby, I'd tell you to string.scan(/\[\d+\]/), which would give you the array ["[0]","[1]"]

Java doesn't have a single-method equivalent, but we can write a scan method as follows:

public List<String> scan(String string, String regex){
   List<String> list = new ArrayList<String>();
   Pattern pattern = Pattern.compile(regex);
   Matcher matcher = pattern.matcher(string);
   while(matcher.find()) {
      list.add(matcher.group());
   }
   return retval;
}

and we can call it as scan(string,"\\[\\d+\\]")

The equivalent Scala code is:

"""\[\d+\]""".r findAllIn string
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜