Regular expression with [ or ( in python
I need to extract IP address in the form
prosseek.amer.corp.com [10.0.40.147]
or
prosseek.amer.corp.com (10.0.40.147)
with Python. How c开发者_如何转开发an I get the IP for either case with Python? I started with something like
site = "prosseek.amer.corp.com"
m = re.search("%s.*[\(\[](\d+\.\d+\.\d+\.\d+)" % site, r)
but it doesn't work.
ADDED
m = re.search("%s.+(\(|\[)(\d+\.\d+\.\d+\.\d+)" % site, r)
m.group(2)
m = re.search(r"%s.*[([](\d+\.\d+\.\d+\.\d+)" % site, r)
m.group(1)
seems to work.
You don't need to escape meta-characters (*
, (
, )
, .
, ...) in character groups (except ]
, unless it is the first character in the character group; [][]+
would match a sequence of square brackets.)
Another tip when it comes to Python is to use r'...'
-style strings. With them, backslashes has no special meaning. r'\\'
would print \\
, since backslash has no special meaning:
m = re.search(r"%s.*[([](\d+\.\d+\.\d+\.\d+)" % site, r)
In the above string it doesn't make any difference though, since \d
doesn't mean anything in Python, but when it comes to stuff like \r
, \\
, etc., it makes lives easier.
Use
[([]
The characters inside the outer brackets are taken literally. You do not need to escape them with a backslash.
For example:
import re
site = "prosseek.amer.corp.com "
m = re.search("%s\s*[([](\d+\.\d+\.\d+\.\d+)" % site, 'prosseek.amer.corp.com (10.0.40.147)')
I'd like to suggest a few slight refinements to what you have:
site = "prosseek.amer.corp.com"
m = re.search(r"%s\s+[([](\d+\.\d+\.\d+\.\d+)" % re.escape(site), r)
m.group(2)
The changes are:
- Pass
site
throughre.escape
so that it is interpreted literally; otherwise the dots in the domain name can match any character. This is extra important ifsite
comes from user input; you don't want someone to be able to stick a regular expression in there and break your program. - Use
\s+
instead of.+
in between the site name and the IP address, so that it only accepts whitespace.
re.findall("(?:\d{1,3}\.){3}\d{1,3}", site)
How about you just ignore the brackets?
site = "prosseek.amer.corp.com"
m = re.search("%s.*(\d+\.\d+\.\d+\.\d+)" % site, r)
import string
site='prosseek.amer.corp.com (10.0.40.147)'
''.join([c for c in site if c not in string.ascii_letters+' []()']).strip('.')
For some reason I like this better than regex
精彩评论