开发者

Setting up EC2 instances as Celery Workers with a local computer as the host

Similar to my question here I'm trying to set up multiple amazon EC2 instances to do some multiprocessing. I was thinking of using Celery to manage the workers. Has anyone gotten celery to work on EC2 instances with a local computer as a host?

Does anyone have any good suggestions, tutorials, advice, etc. that may help? I've used celery to do some simple asynchronous processes in django but nothing of this scale (worker and hosts were on the same machine).

Also most of the processing is 'file-based' (ie reading and writing files) ... do you think it would be better to pickle and transmit the contents of the file with celery (most files are 1-2kb of text) or to mirror the filesystem across the EC2 instances and then just have开发者_开发知识库 the workers return the results (which are usually 0.5 kb of text).


I've used Amazon SQS for tasks management with Amazon EC2. It's very scalable solution. Boto is the best library for managing Amazon services as I found.

For storing big amount of small files you can use MongoDB GridFS, it will allow you to store gigs of local files. I used MongoDB and got prefect performance for such tasks. The only problem - MongoDB on 32 bit architecture very limited. Amazon has micro-instance and next by cost large instance that supports 64. Micro instance very limited by CPU and memory and if it will not fit you needs you need to set up large, that may cost a lot.

Micro instance on my tasks was able to read/write up to 10 gigs a day without any problems.

Also please take a look at Spot instances. It costs about 3 times less than on demand and you may find them pretty good for background processing.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜