开发者

Cannot fetch a web site with python urllib.urlopen() or any web browser other than Shiretoko

Here is the URL of the site I want to fetch

https://salami.parc.com/spartag/GetRepository?friend=jmankoff&keywords=antibiotic&option=jmankoff%27s+tags

When I fetch the web site with the following code and display the contents with the following code:

sock = urllib.urlopen("https://salami.parc.com/spartag/GetRepository?friend=jmankoff&keywords=antibiotic&option=jmankoff's+tags")
html = sock.read()
sock.close()
soup = BeautifulSoup(html)
print soup.prettify()

I get the following output:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
 <head>
  <title>
   Error message
  </title>
 </head>
 <body>
  <h2>
   Invalid input data
  </h2>
 </body>
</html>

I get the same result with urllib2 as well. Now interestingly, this URL works on only Shiretoko web browser v3.5.7. (when I say it works I mean that it brings me the right page). When I feed this URL into Firefox 3.0.15 or Konqueror v4.2.2. I get exactly the same error page (with "Invalid input data"). I don't have any id开发者_C百科ea what creates this difference and how I can fetch this page using Python. Any ideas?

Thanks


If you see the urllib2 doc, it says

urllib2.build_opener([handler, ...])¶

    .....
    If the Python installation has SSL support (i.e., if the ssl module can be imported), HTTPSHandler will also be added. 

    .....

you can try using urllib2 together with ssl module. alternatively, you can use httplib


That's exactly what you get when you click on the link with a webbrowser. Maybe you are supposed to be logged in or have a cookie set or something

I get the same message for firefox 3.5.8 (shiretoko) on linux

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜