Phone Number Regular Expression (Regex) in Python
Dive into python gives an amazing little tutorial on creating a regular expression for phone numbers: http://diveintopython3.ep.io/regular-expressions.html#phonenumbers
The final version comes out to look like:
phone_re = re.compile(r'(\d{3})\D*(\d{3})\D*(\d{4})\D*(\d*)$', re.VERBOSE)
This works fine for almost all examples I can come开发者_开发技巧 up with, however I found a pretty big failure that I can't seem to fix.
If a group of 3 digits comes before the phone number it works fine. IE: "500 dollars off, call 123-456-7891"
If a group of 3 digits comes after the phone number it fails. IE: "Call 123-456-7891 for a discount of up to 500"
Any ideas on a fix that would work for both examples?
The (\d*)$
requires that the string you're matching against end with digit characters (the $
signifies "end of line"). Try removing the $
if you're matching against a larger string where the phone number may not be at the end of the line.
Here's your original, with some spaces (use re.VERBOSE
, or remove the spaces):
(\d{3}) \D* (\d{3}) \D* (\d{4}) \D* (\d*)
The \D*
will match anything that's not a digit, including words. Maybe you should try this:
(\d{3}) \W* (\d{3}) \W* (\d{4}) \W* (\d*)
The \W*
matches anything that's not a word. It will match (222) - 222 - 2222
. However, it will not match if there is a letter between the numbers, as in (222) x 222 - 2222
. The last part of the match (\d*)
appears to be looking for an extension. These can be formatted in a variety of ways—I suggest you either drop it or refine it based on how you expect your data to look. And, like Amber says, you should probably drop the $
.
精彩评论