Regular Expressions - testing if a String contains another String
Suppose you have some this String (one line)
10.254.254.28 - - [06/Aug/2007:00:12:20 -0700] "GET /keyser/22300/ HTTP/1.0" 302 528 "-" "Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.8.1.4) Gecko/20070515 Firefox/2.0.0.4"
and you want to extract the part between the GET and HTTP (i.e., some url) but only if it contains开发者_C百科 the word 'puzzle'. How would you do that using regular expressions in Python?
Here's my solution so far.
match = re.search(r'GET (.*puzzle.*) HTTP', my_string)
It works but I have something in mind that I have to change the first/second/both .*
to .*?
in order for them to be non-greedy. Does it actually matter in this case?
No need regex
>>> s
'10.254.254.28 - - [06/Aug/2007:00:12:20 -0700] "GET /keyser/22300/ HTTP/1.0" 302 528 "-" "Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.8.1.4) Gecko/20070515 Firefox/2.0.0.4"'
>>> s.split("HTTP")[0]
'10.254.254.28 - - [06/Aug/2007:00:12:20 -0700] "GET /keyser/22300/ '
>>> if "puzzle" in s.split("HTTP")[0].split("GET")[-1]:
... print "found puzzle"
...
It does matter. The User-Agent can contain anything. Use non-greedy for both of them.
>>> s = '10.254.254.28 - - [06/Aug/2007:00:12:20 -0700] "GET /keyser/22300/ HTTP/1.0" 302 528 "-" "Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.8.1.4) Gecko/20070515 Firefox/2.0.0.4"'
>>> s.split()[6]
'/keyser/22300/'
精彩评论