开发者

urllib2 returns a different page the browser does?

I'm trying to scrape a page (my router's admin page) but the device seems to be serving a different page to urllib2 than to my browser. has anyone found this before? How can I get around it?

this the code I'm using:

>>> from BeautifulSoup import BeautifulSoup
>>> import urllib2
>>> page = urllib2.urlopen("http://192.168.1.254/index.cgi?active_page=9133&active_page_str=page_bt_home&req_mode=0&mimic_button_field=btn_tab_goto:+9133..&request_id=36590071&bu开发者_如何学运维tton_value=9133")
>>> soup = BeautifulSoup(page)
>>> soup.prettify()

(html output is removed by markdown)


With firebug watch what headers and cookies are sent to server. Then with urllib2.Request and cookielib emulate the same request.

EDIT: Also you can use mechanize.


Simpler than Wireshark may be to use Firebug to see the form of the request being made, and then emulating the same in your code.


Use Wireshark to see what your browser's request looks like, and add the missing parts so that your request looks the same.

To tweak urllib2 headers, try this.


Probably this isn't working because you haven't supplied credentials for the admin page

Use mechanize to load the login page and fill out the username/password.

Then you should have a cookie set to allow you to continue to the admin page.

It is much harder using just urllib2. You will need to manage the cookies yourself if you choose to stick to that route.


in my case it was one of the following:

1) The website vould understood that the access was not from a browser, so i had to fake a browser in python like that:

# Build a opener to fake a browser... Google here I come!
opener = urllib2.build_opener()
# To fake the browser
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
#Read the page
soup = BeautifulSoup(opener.open(url).read())

2) The content of the page was filled dynamically by javascript. In that case read the following post: https://stackoverflow.com/a/11460633/2160507

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜