开发者

Regular Expressions - testing if a String contains another String

Suppose you have some this String (one line)

10.254.254.28 - - [06/Aug/2007:00:12:20 -0700] "GET /keyser/22300/ HTTP/1.0" 302 528 "-" "Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.8.1.4) Gecko/20070515 Firefox/2.0.0.4"

and you want to extract the part between the GET and HTTP (i.e., some url) but only if it contains开发者_C百科 the word 'puzzle'. How would you do that using regular expressions in Python?

Here's my solution so far.

match = re.search(r'GET (.*puzzle.*) HTTP', my_string)

It works but I have something in mind that I have to change the first/second/both .* to .*? in order for them to be non-greedy. Does it actually matter in this case?


No need regex

>>> s
'10.254.254.28 - - [06/Aug/2007:00:12:20 -0700] "GET /keyser/22300/ HTTP/1.0" 302 528 "-" "Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.8.1.4) Gecko/20070515 Firefox/2.0.0.4"'

>>> s.split("HTTP")[0]
'10.254.254.28 - - [06/Aug/2007:00:12:20 -0700] "GET /keyser/22300/ '

>>> if "puzzle" in s.split("HTTP")[0].split("GET")[-1]:
...   print "found puzzle"
...


It does matter. The User-Agent can contain anything. Use non-greedy for both of them.


>>> s = '10.254.254.28 - - [06/Aug/2007:00:12:20 -0700] "GET /keyser/22300/ HTTP/1.0" 302 528 "-" "Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.8.1.4) Gecko/20070515 Firefox/2.0.0.4"'
>>> s.split()[6]
'/keyser/22300/'
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜