Cannot fetch a web site with python urllib.urlopen() or any web browser other than Shiretoko
Here is the URL of the site I want to fetch
https://salami.parc.com/spartag/GetRepository?friend=jmankoff&keywords=antibiotic&option=jmankoff%27s+tags
When I fetch the web site with the following code and display the contents with the following code:
sock = urllib.urlopen("https://salami.parc.com/spartag/GetRepository?friend=jmankoff&keywords=antibiotic&option=jmankoff's+tags")
html = sock.read()
sock.close()
soup = BeautifulSoup(html)
print soup.prettify()
I get the following output:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<title>
Error message
</title>
</head>
<body>
<h2>
Invalid input data
</h2>
</body>
</html>
I get the same result with urllib2 as well. Now interestingly, this URL works on only Shiretoko web browser v3.5.7. (when I say it works I mean that it brings me the right page). When I feed this URL into Firefox 3.0.15 or Konqueror v4.2.2. I get exactly the same error page (with "Invalid input data"). I don't have any id开发者_C百科ea what creates this difference and how I can fetch this page using Python. Any ideas?
Thanks
If you see the urllib2 doc, it says
urllib2.build_opener([handler, ...])¶
.....
If the Python installation has SSL support (i.e., if the ssl module can be imported), HTTPSHandler will also be added.
.....
you can try using urllib2 together with ssl module. alternatively, you can use httplib
That's exactly what you get when you click on the link with a webbrowser. Maybe you are supposed to be logged in or have a cookie set or something
I get the same message for firefox 3.5.8 (shiretoko) on linux
精彩评论