开发者

Extracting number preceding a particular text using a regex

I'm looking for a regex to extract two numbers from the same text (they can be run independently, no need to extract them both in one go.

I'm using yahoo pipes.

Source Text: S$ 5,200 / month Negotiable, 1,475 sqft / 137 sqm (built-in) - Apartment, 10 Anson Road (D02)

Need to extract as a number: 1,475 and also (but can be extracted on a separate instance) Need to extract as a number: 137

I got the following pattern from someone quite helpful on a different forum:

\b(\d+(,\d+)*)\s+(sqft|sqm)

but when i go and use it with a replace $1, it brings back the whole source text instead of just the numbers i want (ie. 1,475 or 137 depending o开发者_开发问答n whether i run \b(\d+(,\d+))\s+(sqft) or \b(\d+(,\d+))\s+(sqm)

what am i doing wrong?


Well you could do this by iterating through the matches and getting the results that way.

But if you want to use the replace method then this could work:

^.*?(?<sqft>\d+(,\d+)*)\s?sqft.*?(?<sqm>\d+(,\d+)*)\s?sqm.*$

And then replace with:

${sqft}
${sqm}

Here it is in action.

This will work with or without a comma in the sqft or sqm numbers. And the .* at the beginning, middle, and end forces it to match the entire string so that the replacement text eliminates everything except for what you're after.


Since you didn't specify a language, here is some Python:

import re

s = "$ 5,200 / month Negotiable, 1,475 sqft / 137 sqm (built-in) - Apartment, 10 Anson Road (D02)"
print re.search(r'\b([0-9.,]+) ?sqft ?/ ?([0-9.,]+) ?sqm', s).groups()
# prints ('1,475', '137')

Searches for any number, comma, or period after a word boundary, followed by an optional space, and the word 'sqft', then an optional space, a slash, an optional space space, followed by any number, comma, or period, an optional space, the word 'sqm'.

This should allow your formatting to be pretty loose (optional spaces, thousands and decimal separators).


In perl, I would write something like:

if ($line ~= m/\b([0-9.,]+) sqft/)
{
  $sqft = $1;
}
else
{
  $sqft = undef;
}

if ($line ~= m/\b([0-9.,]+) sqm/)
{
  $sqm = $1;
}
else
{
  $sqm = undef;
}


You may wish to consider the situations discussed in this answer in crafting a regex for numbers.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜