regular expression to parse IP
I am after a regular expression to parse IP addresses 开发者_如何学Goand extract their host, port, username, and password.
Here are the formats I am interested in:
- 12.34.56.789
- http://12.34.56.789:80
- http://12.34.56.789
- 12.34.56.789:80
- http://login:password@12.34.56.789:80
Try something like this
(http://(\w+:\w+@)?)?(\d{1,3}\.){3}\d{1,3}(:\d{1,5})?
Explanation:
(http://(\w+:\w+@)?)? - optional group of http:// followed by optional user:pass@
(\d{1,3}\.){3} - three groups of one to three digits followed by a dot
\d{1,3} - one to three digits
(:\d{1,5})? - optional group of colon followed by one to five digits
Doing the match this way may not be a best practice. It might be better to plug into some sort of code with real smarts in it, that can do general-purpose URI parsing. If you have limited needs, though, and can comment/document thoroughly that your code will break if you demand more of it, then maybe it makes sense to go down this path.
The simplest way is to match four sets of 1 to 3 digits, with:
- optionally, one-or-more not-:, plus :, plus one-or-more not-@, plus @
- optionally, :, plus 1 to 5 digits
Something like:
([^:]+:[^@]+@)?(\d{1,3}\.){3}\d{1,3}(:\d{1,5})?
But this would accept silly stuff, like "999.999.999.999:99999"
If you only want to accept valid IP addresses, and don't care that it happens to be part of a URI, or don't care what other garbage exists in the string, here is an example:
http://www.regular-expressions.info/examples.html
It basically matches four sets of:
- 2, plus 0-4, plus 0-9
- or 2, plus 5, plus 0-5
- or 1, plus 0-9, plus 0-9
- or 1-9, plus 0-9
- or 0-9
That should get you started.
- optionally, one-or-more not-:, plus :, plus one-or-more not-@, plus @ (max lengths may be interesting, here)
- optionally, :, plus 0-65535 (this I'll leave up to you, based on the 0-255 rules above)
There are other range-based rules for matching IP addresses that you might want to avoid (stuff like 0.0.0.0, and reserved ranges), but it may be easier to do subsequent matching for these.
Basically, I'd suggest you use the very-simple example, or plug into a library.
You can start with that (python):
import re
pattern = "((?P<login>\w+):(?P<password>\w+)@)?(?P<ip>\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3})(:(?P<port>\d+))?"
re.match(pattern, "12.34.56.789").groupdict()
re.match(pattern, "12.34.56.789:80").groupdict()
re.match(pattern, "john:pass@12.34.56.789:80").groupdict()
And obviously, the IP you specified is not valid (as Matt says ...)
Here is a small script whipped up in perl that does the following things a) Strips out username and password after checking that the former starts with a character b) Validates ip address c) validated port
#!/usr/bin/perl
while (<>) {
chomp;
if (/(?:(?:([a-zA-z]\w+)\:(\w+))@)?((\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3}))(?:\:(\d{1,5}))?/) {
print "username=$1\n";
print "password=$2\n";
print "ip address=$3\n";
print "port=$8\n";
print "Warning: IP Address invalid\n" if ($4>255||$5>255||$6>255||$7>255);
print "Warning: Port Address invalid\n" if ($8>65535);
}
}
EDIT: Recommendation from tchrist below
Regexlib would be a helpful resource for your question. You can find many solutions (May be you will need to combine some)
for match exclusively a valid IP adress use
(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)){3}
instead of
([01]?[0-9][0-9]?|2[0-4][0-9]|25[0-5])(\.([01]?[0-9][0-9]?|2[0-4][0-9]|25[0-5])){3}
because many regex engine match the first possibility in the OR sequence
you can try your regex engine : 10.48.0.200
精彩评论