Background jobs on amazon web services

2023-03-24 20:05 问答作者：

I am new to AWS so I needed some advice on how to correctly create background jobs. I've got some data (about 30GB) that I need to:

a) download from some other server; it is a set of zip archives with links within an RSS feed

b) decompress into S3

c) process each file or sometime group of decompressed files, perform transformations of data, and store it into SimpleDB/S3

d) repeat forever depending on RSS updates

Can someone suggest a basic architecture for开发者_C百科 proper solution on AWS?

Thanks.

Denis

I think you should run an EC2 instance to perform all the tasks you need and shut it down when done. This way you will pay only for the time EC2 runs. Depending on your architecture however you might need to run it all the times, small instances are very cheap however.

download from some other server; it is a set of zip archives with links within an RSS feed

You can use wget

decompress into S3

Try to use s3-tools (github.com/timkay/aws/raw/master/aws)

process each file or sometime group of decompressed files, perform transformations of data, and store it into SimpleDB/S3

Write your own bash script

repeat forever depending on RSS updates

One more bash script to check updates + run the script by Cron

First off, write some code that does a) through c). Test it, etc.

If you want to run the code periodically, it's a good candidate for using a background process workflow. Add the job to a queue; when it's deemed complete, remove it from the queue. Every hour or so add a new job to the queue meaning "go fetch the RSS updates and decompress them".

You can do it by hand using AWS Simple Queue Service or any other background job processing service / library. You'd set up a worker instance on EC2 or any other hosting solution that will poll the queue, execute the task, and poll again, forever.

It may be easier to use Amazon Simple Workflow Service, which seems to be intended for what you're trying to do (automated workflows). Note: I've never actually used it.

I think deploying your code on an Elasticbeanstalk Instance will do the job for you at scale. Because I see that you are processing a huge chunk of data here, and using a normal EC2 Instance might max out resources mostly memory. Also the AWS SQS idea of batching the processing will also work to optimize the process and effectively manage time outs on your server-side

继续阅读：amazon-simpledb amazon-web-services background-process

Background jobs on amazon web services

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？