Regex for finding list of numbers in a sentence
I have a sentence as such:
"A list of items 1, 2 and 5 containi开发者_运维技巧ng blah blah blah."
Which could also be something like:
"According to items 2 through 11 there will be blah blah."
Is there an easy regex to grab these numbers? Also, I would need to know whether it was "1 and 5" or "1 through 5" so I could fill in the other numbers if necessary.
You can use the regular expression pattern (?i)(\\d+)(?:(?:(?:\\s*)(,|and|through)(?:\\s*))|.*$)
. The following sample code:
final String ps = "(?i)(\\d+)(?:(?:(?:\\s*)(,|and|through)(?:\\s*))|.*$)";
final Pattern p = Pattern.compile(ps);
for (String s : new String[] {
"A list of items 1, 2 and 5 containing blah blah blah.",
"According to items 2 THROUGH 11 there will be blah blah."})
{
System.out.println("***** TEST STRING *****\n" + s + "\n");
final Matcher m = p.matcher(s);
int cnt = 0;
while (m.find()) {
System.out.println(++cnt + ": G1: " + m.group(1) + " G2: "
+ m.group(2));
}
System.out.println("");
}
Will output:
***** TEST STRING *****
A list of items 1, 2 and 5 containing blah blah blah.
1: G1: 1 G2: ,
2: G1: 2 G2: and
3: G1: 5 G2: null
***** TEST STRING *****
According to items 2 THROUGH 11 there will be blah blah.
1: G1: 2 G2: THROUGH
2: G1: 11 G2: null
You can use group 1 to get the number and group 2 to determine what your next step will be: ,
and and
to include the next number in your list, through
to include a range and null
when there are no more numbers.
You can easily extract all numbers from a string by using a pattern such as "\d+", but for phrases like "1 through 5" you need a much clearer definition of what you want to parse parse.
If you just want to find all digits in string
public List<String> findDigits(String s) {
String regex = "\\d+";
Matcher m = Pattern.compile(regex).matcher(s);
List<String> digits = new ArrayList<String>();
while (m.find()) {
digits.add(s.substring(m.start(), m.end()));
}
return digits;
}
This will do: (\b\d+\s+through\s+\d+)|(\b\d+\s+and\s+\d+)|(\b\d+\b)
Note that the \s
will match [ \t\n\x0B\f\r]
精彩评论