开发者

Parsing Numeric Values with Java's Regular Expression Classes

In Java, I'm attempting to parse data from an ASCII output file. A sample of the data looks is show below. The values are formatted precision 5 scale 3 and no space exists between the values.

80.234 <- 1 value

71.01663.129 <- 2 values ...

67.09159.25353.997

56.02759.77859.25057.749

55.86558.46958.64861.72855.969

What regular expression pattern can I use to match the number values and split them into groups? The pattern (\d+.\d{1,3}) matches a single value. However, with the number of groups for the line specified it does not give the expected answer. For example, I expected the following to find 10 groups.

String testPa开发者_Go百科ttern = "68.65761.25659.01057.67657.14857.06457.41658.77861.16268.641";

// create a pattern to match the output
Pattern p = Pattern.compile("(\\d+\\.\\d{1,3}){10}");

Matcher m = p.matcher(testPattern);

if (m.find())
{
    String group = m.group();
}


If they're all identically formatted, perhaps it would be easier to just read in 6 characters as a string, then use Double.parseDouble to parse that from string to Double?


You're expecting it to somehow break out the individual numbers because that's how you matched them, but it doesn't work that way. What your regex does is capture one number at a time and place it into group #1. Ten times it does this, each time overwriting the contents of group #1 with the new value. When it's done, group() returns the whole string as you discovered, while group(1) would return only the tenth number, 68.641.

This is a common error, probably due to Java's lack of a built-in "find all matches" mechanism. .NET has its Matches() methods, PHP has preg_match_all(), Python has re.findall(), Perl and JavaScript have the /g modifier... every major flavor has a mechanism to return either an array of all matches or an iterator over the matches, or both. But in Java you're expected to call find() in a while loop, as @KennyTM demonstrated.

It's an annoying omission, but not really a surprising one, for Java. Its effect is to force us to write more verbose, less idiomatic code, which has been a Java hallmark from the very beginning. But if you really want to reduce this task to a one-liner, there's the old "split on a lookaround" trick:

String[] result = source.split("(?=\\B\\d{2}\\.\\d{3})");

...or:

String[] result = source.split("(?<=\\G\\d{2}\\.\\d{3})");


There is only 1 group with your regex. Use a while loop to enumerate all of them. (See http://www.ideone.com/FNRsz):

String testPattern = "68.65761.25659.01057.67657.14857.06457.41658.77861.16268.641";
Pattern p = Pattern.compile("\\d+\\.\\d{1,3}");
Matcher m = p.matcher(testPattern);

while(m.find())   // <---
   System.out.println(m.group());


Using Guava, a fixed-length Splitter would work well here.

Iterable<String> numbers = Splitter.fixedLength(6).split(testPattern);

If you were to create a Function<String, Double> (called, say, Numbers.doubleParser()), you could even convert the data to numbers easily. (Obviously you could use BigDecimal or whatever rather than Double depending on your needs.)

private static final Splitter SPLITTER = Splitter.fixedLength(6);

...

public void someMethod(String stringToParse) {
  for(Double value : Iterables.transform(SPLITTER.split(stringToParse),
                                         Numbers.doubleParser())) {
    ...
  }
}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜