Simple non-network concurrency with Twisted
I have a problem with using Twisted for simple concurrency in python. The problem is - I don't know how to do it and all online resources are about Twisted networking abilities. So I am turning to SO-guru开发者_如何学运维s for some guidance.
Python 2.5 is used.
Simplified version of my problem runs as follows:
- A bunch of scientific data
- A function that munches on the data and creates output
- ??? < here enters concurrency, it takes chunks of data from 1 and feeds it to 2
- Output from 3 is joined and stored
My guess is that Twisted reactor
can do the number three job. But how?
Thanks a lot for any help and suggestions.
upd1:
Simple example code. No idea how reactor deals with processes, so I have given it imaginary functions:
datum = 'abcdefg'
def dataServer(data):
for char in data:
yield chara
def dataWorker(chara):
return ord(chara)
r = reactor()
NUMBER_OF_PROCESSES_AV = 4
serv = dataserver(datum)
id = 0
result = array(len(datum))
while r.working():
if NUMBER_OF_PROCESSES_AV > 0:
r.addTask(dataWorker(serv.next(), id)
NUMBER_OF_PROCESSES_AV -= 1
id += 1
for pr, id in r.finishedProcesses():
result[id] = pr
As Jean-Paul said, Twisted is great for coordinating multiple processes. However, unless you need to use Twisted, and simply need a distributed processing pool, there are possibly better suited tools out there.
One I can think of which hasn't been mentioned is celery. Celery is a distributed task queue - you set up a queue of tasks running a DB, Redis or RabbitMQ (you can choose from a number of free software options), and write a number of compute tasks. These can be arbitrary scientific computing type tasks. Tasks can spawn subtasks (implementing your "joining" step you mention above). You then start as many workers as you need and compute away.
I'm a heavy user of Twisted and Celery, so in any case, both options are good.
To actually compute things concurrently, you'll probably need to employ multiple Python processes. A single Python process can interleave calculations, but it won't execute them in parallel (with a few exceptions).
Twisted is a good way to coordinate these multiple processes and collect their results. One library oriented towards solving this task is Ampoule. You can find more information about Ampoule on its Launchpad page: https://launchpad.net/ampoule.
Do you need Twisted at all?
From your description of the problem I'd say that multiprocessing would fit the bill. Create a number of Process
objects that are given a reference to a single Queue
instance. Get them to start their work and put their results on the Queue
. Just use blocking get()
s to read the results.
It seems to me that you are misunderstanding the fundamentals of how Twisted operates. I recommend you give the Twisted Intro a shot by Dave Peticolas. It has been a great help to me, and I've been using Twisted for years!
HINT: Everything in Twisted relies on the reactor!
(source: krondo.com)
精彩评论