开发者

Can somebody explain a money regex that just checks if the value matches some pattern?

There are multiple posts on here that capture value, but I'm just looking to check to see if the value is something. More vaguely put; I'm looking to understand the difference between checking a value, and "capturing" a value. In the current case the value would be the following acceptable money formats:

Here is a post that explains some about a money regex but I don't understand it a bit.

.50
50
50.00
50.0
$5000.00
$.50

I don't want commas (people should know that's ridiculous).

The thing I'm having trouble with are:

  1. Allowing for a $ at the starting of the value (but still optional)
  2. Allowing for only 1 decimal point (but not allowing it at the end)
  3. Understanding开发者_如何学Go how it's working inside
  4. Also understanding out to get a normalized version (only digits and a the optional decimal point) out of it that strips the dollar sign.

My current regex (which obviously doesn't work right) is:

# I'm checking the Boolean of the following:
re.compile(r'^[\$][\d\.]$').search(value)

(Note: I'm working in Python)


Assuming you want to allow $5. but not 5., the following will accept your language:

money = re.compile('|'.join([
  r'^\$?(\d*\.\d{1,2})$',  # e.g., $.50, .50, $1.50, $.5, .5
  r'^\$?(\d+)$',           # e.g., $500, $5, 500, 5
  r'^\$(\d+\.?)$',         # e.g., $5.
]))

Important pieces to understand:

  • ^ and $ match only at the beginning and end of the input string, respectively.
  • \. matches a literal dot
  • \$ matches a literal dollar sign
    • \$? matches a dollar sign or nothing (i.e., an optional dollar sign)
  • \d matches any single digit (0-9)
    • \d* matches runs of zero or more digits
    • \d+ matches runs of one or more digits
    • \d{1,2} matches any single digit or a run of two digits

The parenthesized subpatterns are capture groups: all text in the input matched by the subexpression in a capture group will be available in matchobj.group(index). The dollar sign won't be captured because it's outside the parentheses.

Because Python doesn't support multiple capture groups with the same name (!!!) we must search through matchobj.groups() for the one that isn't None. This also means you have to be careful when modifying the pattern to use (?:...) for every group except the amount.

Tweaking Mark's nice test harness, we get

for test, expected in tests:
    result = money.match(test) 
    is_match = result is not None
    if is_match == expected:
      status = 'OK'
      if result:
        amt = [x for x in result.groups() if x is not None].pop()
        status += ' (%s)' % amt
    else:
      status = 'Fail'
    print test + '\t' + status

Output:

.50     OK (.50)
50      OK (50)
50.00   OK (50.00)
50.0    OK (50.0)
$5000   OK (5000)
$.50    OK (.50)
$5.     OK (5.)
5.      OK
$5.000  OK
5000$   OK
$5.00$  OK
$-5.00  OK
$5,00   OK
        OK
$       OK
.       OK
.5      OK (.5)


Here's a regex you can use:

regex = re.compile(r'^\$?(\d*(\d\.?|\.\d{1,2}))$')

Here's a test-bed I used to test it. I've included all your tests, plus some of my own. I've also included some negative tests, as making sure that it doesn't match when it shouldn't is just as important as making sure that it does match when it should.

tests = [
    ('.50', True),
    ('50', True),
    ('50.00', True),
    ('50.0', True),
    ('$5000', True),
    ('$.50', True),
    ('$5.', True),
    ('$5.000', False),
    ('5000$', False),
    ('$5.00$', False),
    ('$-5.00', False),
    ('$5,00', False),
    ('', False),
    ('$', False),
    ('.', False),
]

import re
regex = re.compile(r'^\$?(\d*(\d\.?|\.\d{1,2}))$')
for test, expected in tests:
    result = regex.match(test) 
    is_match = result is not None
    print test + '\t' + ('OK' if is_match == expected else 'Fail')

To get the value without the $, you can use the captured group:

print result.group(1)


Also understanding out to get a normalized version (only digits and a the optional decimal point) out of it that strips the dollar sign.

This is also known as "capturing" the value ;)

Working off Aaron's base example:

/^\$?(\d+(?:\.\d{1,2})?)$/

Then the amount (without the dollar sign) will be in capture group 1.


I believe the following regex will meet your needs:

/^\$?(\d*(\.\d\d?)?|\d+)$/

It allows for an optional '$'. It allows for an optional decimal, but requires at least one but not more than two digits after the decimal if the decimal is present.

Edit: The outer parentheses will catch the whole numeric value for you.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜