开发者

Scraping Data Off National Vulnerbility Database: can't figure out clicking on a button (Mechanize+Python)

I am trying to scrape some data off National Vulnerbability Database (http://web.nvd.nist.gov). What I want to do is enter a search term, which brings me the first 20 results, scrape that data. then I want to click "next 20" until I traversed all results.

I am able to successfully submit search terms, but clicking "next 20" is not working at all.

Tools I am using Python + Mechanize

Here is my code:

# Browser
b = mechanize.Browser()

# The URL to this service
URL = 'http://web.nvd.nist.gov/view/vuln/search'
Search = ['Linux', 'Mac OS X', 'Windows']

def searchDB():
    SearchCounter=0
    for i in Search:
        # Load the page
        read = b.open(URL)
        # Select the form
开发者_JAVA百科        b.select_form(nr=0)
        # Fill out the search form
        b['vulnSearchForm:text'] = Search[int(SearchCounter)] 
        b.submit('vulnSearchForm:j_id120')
        result=b.response().read()
        file=open(Search[SearchCounter]+".txt","w")
        file.write(result)

        '''Here is where the problem is. vulnResultsForm:j_id116 is value of the "next 20 button'''
        b.select_form(nr = 0)
        b.form.click('vulnResultsForm:j_id116')
        result=b.response().read()

if __name__ == '__main__':
    searchDB()


From the docstring of b.form.click:

Return request that would result from clicking on a control.

The request object is a urllib2.Request instance, which you can pass to urllib2.urlopen (or ClientCookie.urlopen).

So:

request = b.form.click('vulnResultsForm:j_id116')
b.open(request)
result = b.response().read()


I haven't used Mechanize outside of zope.testbrowser, whcih is based on Mechanize, so there may be differences, but here goes:

You click on the form...Try to get the button and click on the button instead. Something like this, I think:

form.find_control("j_id120").click()

Also:

b['vulnSearchForm:text'] = Search[int(SearchCounter)] 

Can be replaced with

b['vulnSearchForm:text'] = i

As i will contain the value. Python is not javascript, loop variables are not numbers (unless you want them to be).

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜