开发者

Python: using a regular expression to match one line of HTML

This simple Python method I put together just checks to see if Tomcat is running on one of our servers.

import urllib2
import re
import sys

def开发者_如何学Python tomcat_check():

    tomcat_status = urllib2.urlopen('http://10.1.1.20:7880')
    results = tomcat_status.read()
    pattern = re.compile('<body>Tomcat is running...</body>',re.M|re.DOTALL)
    q = pattern.search(results)
    if q == []:
        notify_us()
    else:
         print ("Tomcat appears to be running")
    sys.exit()

If this line is not found :

<body>Tomcat is running...</body>

It calls :

notify_us()

Which uses SMTP to send an email message to myself and another admin that Tomcat is no longer runnning on the server...

I have not used the re module in Python before...so I am assuming there is a better way to do this... I am also open to a more graceful solution with Beautiful Soup ... but haven't used that either..

Just trying to keep this as simple as possible...


Why use regex here at all? Why not just a simple string search?:

if not '<body>Tomcat is running...</body>' in results:
   notify_us()


if not 'Tomcat is running' in results:
    notify_us()


There are lots of different methods:

str.find()

if results.find("Tomcat is running...") != -1:
    print "Tomcat appears to be running"
else:
    notify_us()

Using X in Y

if "Tomcat is running..." in result:
    print "Tomcat appears to be running"
else:
    notify_us()

Using Regular Expressions

if re.search(r"Tomcat is running\.\.\.", result):
    print "Tomcat appears to be running"
else:
    notify_us()

Personally, I prefer the membership operator to test if the string is in another string.


Since you appear to be looking for a fixed string (not a regexp) that you have some control over and can be expected to be consistent, str.find() should do just fine. Or what Daniel said.


As you have mentioned, regular expressions aren't suited for parsing XML like structures (at least, for more complex queries). I would do something like that:

from lxml import etree
import urllib2

def tomcat_check(host='127.0.0.1', port=7880):
    response = urllib2.urlopen('http://%s:%d' % (host, port))
    html = etree.HTML(response.read())
    return html.findtext('.//body') == 'Tomcat is running...'

if tomcat_check('10.1.1.20'):
    print 'Tomcat is running...'
else:
    # notify someone
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜