Annoying Twisted Python problem
I'm trying to answer the following question out of personal interest: What is the fastest way to send 100,000 HTTP requests in Python?
And this is what I have came up so far, but I'm experiencing something very stange.
When installSignalHandlers is True, it just hangs. I can see that the DelayedCall
instances are in reactor._newTimedCalls
, but processResponse
never gets called.
When installSignalHandlers is False, it throws an error and works.
from twisted.internet import reactor
from twisted.web.client import Agent
from threading import Semaphore, Thread
import time
concurrent = 100
s = Semaphore(concurrent)
reactor.suggestThreadPoolSize(concurrent)
t=Thread(
target=reactor.run,
kwargs={'installSignalHandlers':True})
t.daemon=True
t.start()
agent = Agent(reactor)
def processResponse(response,url):
print response.code, url
s.release()
def processError(response,url):
print "error", url
s.release()
def addTask(url):
req = agent.request('HEAD', url)
req.addCallback(processResponse, url)
req.addErrback(processError, url)
for url in open('urllist.txt'):
addTask(url.strip())
s.acquire()
while s._Semaphore__value!=concurrent:
time.sleep(0.1)
reactor.stop()
And here is the error that it throws when installSignalHandlers is True: (Note: This is the expected behaviour! The question is why it doesn't work when installSignalHandlers is False.)
Traceback (most recent call last):
File "/usr/lib/python2.6/dist-packages/twisted/internet/base.py", line 396, in fireEvent
DeferredList(beforeResults).addCallback(self._continueFiring)
File "/usr/lib/python2.6/dist-packages/twisted/internet/defer.py", line 224, in addCallback
callbackKeywords=kw)
File "/usr/lib/python2.6/dist-packages/twisted/internet/defer.py", line 213, in addC开发者_StackOverflow中文版allbacks
self._runCallbacks()
File "/usr/lib/python2.6/dist-packages/twisted/internet/defer.py", line 371, in _runCallbacks
self.result = callback(self.result, *args, **kw)
--- <exception caught here> ---
File "/usr/lib/python2.6/dist-packages/twisted/internet/base.py", line 409, in _continueFiring
callable(*args, **kwargs)
File "/usr/lib/python2.6/dist-packages/twisted/internet/base.py", line 1165, in _reallyStartRunning
self._handleSignals()
File "/usr/lib/python2.6/dist-packages/twisted/internet/base.py", line 1105, in _handleSignals
signal.signal(signal.SIGINT, self.sigInt)
exceptions.ValueError: signal only works in main thread
What am I doing wrong and what is the right way? I'm new to twisted.
@moshez: Thanks. It works now:
from twisted.internet import reactor, threads
from urlparse import urlparse
import httplib
import itertools
concurrent = 100
finished=itertools.count(1)
reactor.suggestThreadPoolSize(concurrent)
def getStatus(ourl):
url = urlparse(ourl)
conn = httplib.HTTPConnection(url.netloc)
conn.request("HEAD", url.path)
res = conn.getresponse()
return res.status
def processResponse(response,url):
print response, url
processedOne()
def processError(error,url):
print "error", url#, error
processedOne()
def processedOne():
if finished.next()==added:
reactor.stop()
def addTask(url):
req = threads.deferToThread(getStatus, url)
req.addCallback(processResponse, url)
req.addErrback(processError, url)
added=0
for url in open('urllist.txt'):
added+=1
addTask(url.strip())
try:
reactor.run()
except KeyboardInterrupt:
reactor.stop()
You're using waaaaay too much "reactor calls" (for example, there's a good chance that agent.request calls into the reactor) from the main thread. I'm not sure if that's your problem, but it's still not supported -- the only reactor calls to make from the non-reactor thread is reactor.callFromThread.
Also, the whole architecture seems strange. Why are you not running the reactor on the main thread? Reading a whole file with 10,000 requests, and splitting them, should not be a problem to do from the reactor, even if you do it all at once.
You can probably hit a pure-Twisted solution not using any threads.
精彩评论