Threading HTTP requests (with proxies)

2023-03-17 03:51 问答作者：

I've looked at similar questions, but there always seems to be a whole lot of disagreement over the best way to handle threading with HTTP.

What I specifically want to do: I'm using Python 2.7, and I want to try and thread HTTP requests (specifically, POSTing something), with a SOCKS5 proxy for each. The code I have already works, but is rather slow since it's waiting for each request (to the proxy server, then the web server) to finish before starting another. Each thread would most likely be making a different request with a different SOCKS proxy.

So far I've purely been using urllib2. I looked into modules like PycURL, but it is extremely difficult to install properly with Python 2.7 on Windows, which I want to support and which I am coding on. I'd be willing to use any other module though.

I've looked at these questions in particular:

Python urllib2.urlopen() is slow, need a better way to read several urls

Python - Example of urllib2 asynchronous / threaded request using HTTPS

Many of the examples received downvotes and arguing. Assuming the commenters are correct, making a client with an asynchronous framework like Twisted sounds like it would be the fastest thing to use. However, I Googled ferociously, and it does not provide any sort of support for SOCKS5 proxies. I'm currently using the Socksipy module, and I could try something like:

socks.setdefaultproxy(socks.PROXY_TYPE_SOCKS5, IP, port)
socks.wrapmodule(twisted.web.client)

I have no idea if that would work though, and I also don't even know if Twisted is what I really want to use. I could also just go with the threading module and work that 开发者_开发技巧into my current urllib2 code, but if that is going to be much slower than Twisted, I may not want to bother. Does anyone have any insight?

Perhaps an easier way would be to just rely on gevent (or eventlet) to let you open lots of connections to the server. These libs monkeypatch urllib to make then async, whilst still letting you write code that is sync-ish. Their smaller overhead vs threads also means you can spawn lots more (1000s would not be unusual).

Ive used something like this loads (plagiarized from here):

urls = ['http://www.google.com', 'http://www.yandex.ru', 'http://www.python.org']

import gevent
from gevent import monkey

# patches stdlib (including socket and ssl modules) to cooperate with other greenlets
monkey.patch_all()

import urllib2


def print_head(url):
    print ('Starting %s' % url)
    data = urllib2.urlopen(url).read()
    print ('%s: %s bytes: %r' % (url, len(data), data[:50]))

jobs = [gevent.spawn(print_head, url) for url in urls]

继续阅读：python twisted urllib2

Threading HTTP requests (with proxies)

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？