开发者

ruby regex matching cent ¢

I am having difficulty matching string "79¢ /lb" with this regex: (\$|¢)\d+(.\d{1,2})?

It works fine when the cent symbol appears in the beginning, but I don't know what needs to be added near the end of the string.

Basically I'm planning to extract a float value from this price tag, that is, 0.79, 开发者_如何学JAVAthanks in advance, I'm using ruby.


Well, that regex requires the $ or ¢ to be at the start of the string. To match 79¢ /lb, you'll need something like:

(\d+)¢

where the ¢ comes after the digits.

A single regex to match the many varied formats that you're likely to see will be a little more complex. I would suggest either doing it as multiple regexes (for simplicity), or asking another question here specifying the full range of strings you want to capture the prices from.


It's easiest to figure out the right regex when you consider each case separately. If I understand your question correctly, there are 4 cases:

  1. cents, with the ¢ symbol before the price
  2. cents, with the ¢ symbol after the price
  3. dollars (and optional cents), with the $ symbol before the price
  4. dollars (and optional cents), with the $ symbol after the price

First, write a regex for each case separately:

  1. ¢(\d{1,2})\b
  2. \b(\d{1,2})¢
  3. \$(\d+(?:\.\d{2})?)\b
  4. \b(\d+(?:\.\d{2})?)\$

Then, combine them into a single regex:

regex = %r{
  ¢(\d{1,2})\b          | # case 1
  \b(\d{1,2})¢          | # case 2
  \$(\d+(?:\.\d{2})?)\b | # case 3
  \b(\d+(?:\.\d{2})?)\$   # case 4
}x

Then, match to your heart's content:

string_with_prices.scan(regex) do |match|
  # If there was a match in the first two groups, it's for cents
  cents   = $1 || $2
  # ...and the last two groups are dollars.
  dollars = $3 || $4
  if cents
    puts "found price (cents): #{cents}"
  elsif dollars
    puts "found price (dollars): #{dollars}"
  else
    puts 'unknown match!'
  end
end

Note: To test this code, I had to use 'c' instead of '¢' because Ruby was telling me invalid multibyte char (US-ASCII). To avoid this issue, use a different character encoding, or else figure out the encoded value of the '¢' character and embed it directly in the regex, e.g. %r{\x42} instead of %r{¢}.


Maybe you don't need to do everything in your reg exp;

#price is the string that contains the price
if price =~ /\$|¢/
   value = string.match(/\d+/)
end

Or something along those lines.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜