grep with regex for phone number
I would like to get the phone numbers from a file. I know the numbers have different forms, I can handle for a single one, but don't know how to get a uniform regex. For example
xxx-xxx-xxxx
(xxx)xxx-xxxx
xxx xxx xxxx
xxxxxxxxxx
I can only handle 1, 2, and 4 together
grep '[0-9]\{3\}[ -]\?[0-9]\{3\}[ -]\?[0-9]\{4\}' file
Is there any one single regex开发者_如何学JAVA can handle all of these four forms?
grep '\(([0-9]\{3\})\|[0-9]\{3\}\)[ -]\?[0-9]\{3\}[ -]\?[0-9]\{4\}' file
Explanation:
([0-9]\{3\})
three digits inside parentheses
\|
or
[0-9]\{3\}
three digits not inside parens
...with grouping parentheses - \(...\)
- around the alternation so the rest of the regex behaves the same no matter which alternative matches.
There are usually four patterns of phone numbers
1. xxx-xxx-xxxx grep -o '[0-9]\{3\}\-[0-9]\{3\}\-[0-9]\{4\}' file.txt
2. (xxx)xxx-xxxx grep -o '([0-9]\{3\})[0-9]\{3\}\-[0-9]\{4\}' file.txt
3. xxx xxx xxxx grep -o '[0-9]\{3\}\s[0-9]\{3\}\s[0-9]\{4\}' file.txt
4. xxxxxxxxxx grep -o '[0-9]\{10\}' file.txt
In all
grep -o '\([0-9]\{3\}\-[0-9]\{3\}\-[0-9]\{4\}\)\|\(([0-9]\{3\})[0-9]\{3\}\-[0-9]\{4\}\)\|\([0-9]\{10\}\)\|\([0-9]\{3\}\s[0-9]\{3\}\s[0-9]\{4\}\)' file.txt
Of course, one could simplify the regex above but we can also leave this simplification to grep itself ~
This is just a modified version of Alan Moore's solution. This is protected against some race condition where the last part of the number has more than four digits in it or the if the total number of digits are more than 10:
grep '\(\(([0-9]\{3\})\|[0-9]\{3\}\)[ -]\?\)\{2\}[0-9]\{4\} '
Explanation:
\(([0-9]\{3\})\|[0-9]\{3\}\)
matches exactly three digits (e.g. 234) with or without surrounded by parentheses.\|
performs the 'OR' operation.- The first
\( ... \)
groups together the above format followed by aspace
or-
orno space
at all - ([ -]\?
) does that. - The
\{2\}
matches exactly two occurrences of the above - The
[0-9]\{4\} '
matches exactly one occurrence for a 4 digit number followed by aspace
And it's a bit shorter as well. Tested on RHEL and Ubuntu. Cheers!!
You can just OR (|
) your regexes together -- will be more readable that way too!
My first thought is that you may find it easier to see if your candidate number matches against one of four regular expressions. That will be easier to develop/debug, especially as/when you have to handle additional formats in the future.
grep -P '[0-9]{3}-[0-9]{3}-[0-9]{3}|[0-9]{3}\ [0-9]{3}\ [0-9]{3}|[0-9]{9}|\([0-9]{3}\)[0-9]{3}-[0-9]{3}'
Try this one:
^(\d{10}|((([0-9]{3})\s){2})[0-9]{4}|((([0-9]{3})\-){2})[0-9]{4}|([(][0-9]{3}[)])[0-9]{3}[-][0-9]{4})$
This is only applicable for the formate you mention above like:
xxxxxxxxxx
xxx xxx xxxx
xxx-xxx-xxxx
(xxx)xxx-xxxx
We can put all the required phone number validations one by one using an or condition which is more likely to work well (but tiresome coding).
grep '^[0-9]\{10\}$\|^[0-9]\{3\}[-][0-9]\{3\}[-][0-9]\{4\}$\|^[0-9]\{3\}[ ][0-9]\{3\}[ ][0-9]\{4\}$\|^[(][0-9]\{3\}[)][0-9]\{3\}[-][0-9]\{4\}$' phone_number.txt
returns all the specific formats :
- 920-702-9999
- (920)702-9999
- 920 702 9999
- 9207029999
+?(1[ -])?((\d{3})[ -]|(\d{3}[ -]?)){2}\d{4}
works for:
123-678-1234
123 678 1234
(123)-678-1234
+1-(123)-678-1234
1-(123)-678-1234
1 123 678 1234
1 (123) 678 1234
grep -oE '\(?\<[0-9]{3}[-) ]?[0-9]{3}[ -]?[0-9]{4}\>'
Matches all your formats.
The \<
and \>
word boundaries prevent matching numbers that are too long, such as 123-123-12345
or 1234-123-1234
I got this:
debian:tmp$ cat p.txt
333-444-5555
(333)333-6666
123 456 7890
1234567890
debian:tmp$ egrep '\(?[0-9]{3}[ )-]?[0-9]{3}[ -]?[0-9]{4}' p.txt
333-444-5555
(333)333-6666
123 456 7890
1234567890
debian:tmp$ egrep --version
GNU grep 2.5.3
Copyright (C) 1988, 1992-2002, 2004, 2005 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
debian:tmp$
精彩评论