开发者

making urllib request in Python from the client side

I've written a Python application that makes web requests using the urllib2 library after which it scrapes the data. I could deploy this as a web application which means all urllib2 requests go through my web-server. This leads to the danger of the server's IP being banned due to the high number of web requests for many users. The other option is to create an desktop application which I don't want to do. Is there any way I could deploy my application so that I can get my web-requests through the client side. One way was to use Jython to create an applet but I've read that Java applets can only make web-requests to the server it is deployed on and the only way to to circumvent this is to create a server side proxy which leads us back to the problem of the server's ip getting banned.

This might sounds sound like and impossible situation and I'll probably end up creating a desktop application but I开发者_运维百科 thought I'd ask if anyone knew of an alternate solution.

Thanks.


You can use a signed Java applet, they can use the Java security mechanism to enable access to any site. This tutorial explains exactly what you have to do: http://www-personal.umich.edu/~lsiden/tutorials/signed-applet/signed-applet.html

The same might be possible from a Flash applet. Javascript is also restricted to the published site and doesn't allow being signed or security exceptions like this, AFAIK.


You probably can use AJAX requests made from JavaScript that is a part of client-side.

  • Use server → client communication to give commands and necessary data to make a request
  • …and use AJAX communication from client to 3rd party server then.


This depends on the form of "scraping" you intend to do:

  • You might run into problems running an AJAX call to a third-party site. Please see Screen scraping through AJAX and javascript.
  • An alternative would be to do it server-side, but to cache the results so that you don't hit the third-party server unnecessarily.

Check out diggstripper on google code.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜