Java regex to extract integer from large body of text
I need to extract a value from a large body of text. I'm assuming the best way to do this would be to use a regular expression. If anyone thinks there's a better way to do it, feel free to offer up a suggestion.
The value I need to extract always appears in a string of the form:
[formatted_int_value] results across [the_integer_value_I_need_to_extract] pages
e.g: 3,342 results across 67 pages
In the example above the value I'm trying to extract is 67. Also note that each word in the example above may be separated by one or more whitespaces and/or newline characters. And, as mentioned above, this text is part of a la开发者_JAVA技巧rger body of text (I'm screen scraping a web page).
Can someone help me with a regex to extract the int value I need (67 in my example above) that takes into consideration the conditions I've provided?
Thanks.
The regex would be quite straight-forward:
([\d,]+)\s+results\s+across\s+(\d+)\s+pages
The 67 would be in group 2, the other number (if you need it) in group 1.
var text = "some text here 3,342 results across 67 pages some more text here";
var regex = /([\d,]+)\s+results\s+across\s+(\d+)\s+pages/;
var matches = regex.exec(text);
/* matches will be this array:
["3,342 results across 67 pages", "3,342", "67"]
---- entire match -------------- --g1--- -g2-
*/
int theIntYouWantToExtract = Integer.parseInt(yourLongText.replaceAll(
".*([\d,]+) results across ([\d,]+) pages.*",
"$2"));
精彩评论