urllib2 redirect empty page (though code is 200 and geturl() points to new page)
I am trying to access a web page using urllib2 and the automatic redirect in urllib2 does not seem to retrieve the entire page. Here is my code:
request = urllib2.Request(link)
request.add_header('User-Agent','...')
opener = urllib2.build_opener()
page = opener.open(request)
print(page.code)
print(page.geturl())
print(page.read())
a) When link = 'https://www.google.com'. It prints
200
https://www.google.com
<!doctype...> Etc. Etc. </s开发者_Go百科cript>
b) When link = 'https://www.xyz.com/a_link_which_is_redirected.html'. It prints
200
https://the_new_link
<blank>
However, If I access the 'link' in b) via an internet browser, it correctly displays a page with a form.
View the source of the Google page - it really does end with a script tag. They leave off some of the closing tags because browsers can still interpret it correctly and it saves bandwidth.
Here are some test redirect pages. Which of those do not work for you?
精彩评论