Regular Expression for finding phone numbers [duplicate]

2022-12-31 11:49 问答作者：

This question already has answers here: Closed 12 years ago.

Possible Duplicates:
A comprehensive regex for phone number validation
grep with regex for phone number

Hello Everyone,

I am new to Stackoverflow and I have a quick question. Let's assume we are given a large number of HTML files (large as in theoretically infinite). How can I use Regular Expressions to extract the list of Phone Num开发者_C百科bers from all those files?

Explanation/expression will be really appreciated. The Phone numbers can be any of the following formats:

(123) 456 7899
(123).456.7899
(123)-456-7899
123-456-7899
123 456 7899
1234567899

Thanks a lot for all your help and have a good one!

/^[\.-)( ]*([0-9]{3})[\.-)( ]*([0-9]{3})[\.-)( ]*([0-9]{4})$/

Should accomplish what you are trying to do.

The first part ^ means the "start of the line" which will force it to account for the whole string.

The [\.-)( ]* that I have in there mean "any period, hyphen, parenthesis, or space appearing 0 or more times".

The ([0-9]{3}) clusters match a group of 3 numbers (the last one is set to match 4)

Hope that helps!

Without knowing what language you're using I am unsure whether or not the syntax is correct.

This should match all of your groups with very few false positives:

/\(?([0-9]{3})\)?([ .-]?)([0-9]{3})\2([0-9]{4})/

The groups you will be interested in after the match are groups 1, 3, and 4. Group 2 exists only to make sure the first and second separator characters , ., or - are the same.

For example a sed command to strip the characters and leave phone numbers in the form 123456789:

sed "s/(\{0,1\}\([0-9]\{3\}\))\{0,1\}\([ .-]\{0,1\}\)\([0-9]\{3\}\)\2\([0-9]\{4\}\)/\1\3\4/"

Here are the false positives of my expression:

(123)456789
(123456789
(123 456 789
(123.456.789
(123-456-789
123)456789
123) 456 789
123).456.789
123)-456-789

Breaking up the expression into two parts, one that matches with parenthesis and one that does not will eliminate all of these false positives except for the first one:

/\(([0-9]{3})\)([ .-]?)([0-9]{3})\2([0-9]{4})|([0-9]{3})([ .-]?)([0-9]{3})\5([0-9]{4})/

Groups 1, 3, and 4 or 5, 7, and 8 would matter in this case.

This will help you catch the ones with an area code in parentheses

([0-9]\{3\})[ .-][0-9]\{3\}[ .-][0-9]\{4\}

The others are:

[0-9]\{3\}[ -][0-9]\{3\}[ -][0-9]\{4\}
[0-9]\{10\}

I separated the first one and the second one because putting them together without backtracking could get you into accepting (123 456 7890 or 123) 456 7890

Note also that on my terminal using grep, I had to escape the { } for the repetition. You may not have to, or you may have to escape other characters depending on where you intend to use this.

^($?\d{3}$?)([ .-])(\d{3})([ .-])(\d{4})$

This should match all except the last pattern. For the last one you could use a separated pattern ^\d{10}$

And there is a error, it will match (123 456 7899

^($?\d{3}$?), if we break this code, the first character (^) matches the beginning of the text. $? and $? will accept or not this character, there is the problem to do that you have to check if there was an opening char, if there were the second have to match, I don't know if it is possible using Regex only. And \d{3} will match three numbers
([ .-]) will match any of those, but only one and only once.
(\d{3}) will match three numbers
Same as 2
(\d{4})$ four numbers followed by the end of the text ($)

Since you want to extract from an HTML page you would have to ignore ^ and $ to match any part of the text and set a flag global, in javascript /exp/g

You can test Regex here

继续阅读：phone-number regex

Regular Expression for finding phone numbers [duplicate]

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？