extracting phone numbers
I'm trying to exract phone numbers from a set of data. It has to be able to extract international and local numbers from each country.
The rules I've laid out for it are: 1. Look for the international symbol, indicating it's an international dialing number with a valid extension(from +1 to +999). 2. If the plus symbol is present, make sure the next following character is a number. 3. If there is none, look at the length to validate it is between 7 and 10 digits long. 4. In the event that the number is divided (correctly via international standers) by either a hyphen(-) or space make sure the amount of digits in between them are either 3 or 4
What I'v开发者_JAVA技巧e got so var is:
\+(?=[1-999])(\d{4}[0-9][-\s]\d{3}[0-9][-\s]\d{4}[0-9])|(\d{7,11}[0-9])
That's for international, and the local search is\d{7,10}
The thing is, that it doesn't actually pick up numbers with spaces or hyphens in it. Can anybody give me some advice on it?
\d
already means "digit", so you shouldn't put another [0-9]
after it (which means the same).
In the same vein, [1-999]
doesn't mean what you think it does. It in fact matches one (1) digit between 1 and 9. You probably want \d{1,3}
although that would also match 0
.
Then, you're only allowing one variation of dividing blocks (4-3-4) - why? This is not going to match many, many valid phone numbers.
I would suggest the following:
Search your string using the regex \+?(?=\d)[\d\s-]{7,13}\b
to grab anything that remotely looks like a phone number. Perhaps you also want to include parentheses and slashes in the allowed character list: \+?(?=\d)[\d\s/()-]{7,14}\b
Then process and validate those strings separately, best after removing all punctuation/whitespace (except the +).
I'm not sure it will be possible to create a regex to match every country - some countries have conflicting rules.
it's entirely possible to have e.g. two valid local numbers contained within 1 valid international number.
You might want to start by looking at some of the answers to this question:
A comprehensive regex for phone number validation
If you're looking to create something definitive for every country, good luck, and you'll probably need to spend a while with some technical standards...
i.e. both 177
and 186-0039-011-81-90-1177-1177
are valid phone numbers in the same country
精彩评论