开发者

Regex for finding list of numbers in a sentence

I have a sentence as such:

"A list of items 1, 2 and 5 containi开发者_运维技巧ng blah blah blah."

Which could also be something like:

"According to items 2 through 11 there will be blah blah."

Is there an easy regex to grab these numbers? Also, I would need to know whether it was "1 and 5" or "1 through 5" so I could fill in the other numbers if necessary.


You can use the regular expression pattern (?i)(\\d+)(?:(?:(?:\\s*)(,|and|through)(?:\\s*))|.*$). The following sample code:

final String ps = "(?i)(\\d+)(?:(?:(?:\\s*)(,|and|through)(?:\\s*))|.*$)";
final Pattern p = Pattern.compile(ps);
for (String s : new String[] {
        "A list of items 1, 2 and 5 containing blah blah blah.",
        "According to items 2 THROUGH 11 there will be blah blah."})
{
    System.out.println("***** TEST STRING *****\n" + s + "\n");
    final Matcher m = p.matcher(s);
    int cnt = 0;
    while (m.find()) {
        System.out.println(++cnt + ": G1: " + m.group(1) + " G2: "
                + m.group(2));
    }
    System.out.println("");
}

Will output:

***** TEST STRING *****
A list of items 1, 2 and 5 containing blah blah blah.

1: G1: 1 G2: ,
2: G1: 2 G2: and
3: G1: 5 G2: null

***** TEST STRING *****
According to items 2 THROUGH 11 there will be blah blah.

1: G1: 2 G2: THROUGH
2: G1: 11 G2: null

You can use group 1 to get the number and group 2 to determine what your next step will be: , and and to include the next number in your list, through to include a range and null when there are no more numbers.


You can easily extract all numbers from a string by using a pattern such as "\d+", but for phrases like "1 through 5" you need a much clearer definition of what you want to parse parse.


If you just want to find all digits in string

public List<String> findDigits(String s) {
    String regex = "\\d+";
    Matcher m = Pattern.compile(regex).matcher(s);
    List<String> digits = new ArrayList<String>();
    while (m.find()) {
        digits.add(s.substring(m.start(), m.end()));
    }
    return digits;
}


This will do: (\b\d+\s+through\s+\d+)|(\b\d+\s+and\s+\d+)|(\b\d+\b)

Note that the \s will match [ \t\n\x0B\f\r]

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜