Amazon Elastic Map Reduce - Keep Server alive?
I am testing jobs in EMR and each and every test takes a lot of time to start up. Is there a way to keep the server/master node alive in Amazon EMR? I know this can be done with the API. But, I wanted to know 开发者_开发百科if this can be done in the aws console?
You cannot do this from the AWS console. To quote the developer guide
The Amazon Elastic MapReduce tab in the AWS Management Console does not support adding steps to a job flow.
You can only do this via the CLI and API, by creating a job flow, then adding steps to it.
$ ./elastic-mapreduce --create --active --stream
You can't do this with the web console - but through the API and programming tools, you will be able to add multiple steps to a long-running job, which is what I do. That way you can fire off jobs one after the other on the same long-running cluster, without having to re-create a new one each time.
If you are familiar with Python, I highly recommend the Boto library. The other AWS API tools let you do this as well.
If you follow the Boto EMR tutorial, you'll find some examples:
Just to give you an idea, this is what I do (with streaming jobs):
# Connect to EMR
conn = boto.connect_emr()
# Start long-running job, don't forget keep_alive setting
jobid = conn.run_jobflow(name='My jobflow',
log_uri='s3://<my log uri>/jobflow_logs',
keep_alive=True)
# Create your streaming job
step = StreamingStep(...)
# Add the step to the job
conn.add_jobflow_steps(jobid, [step])
# Wait till its complete
while True:
state = conn.describe_jobflow(jobid).steps[-1].state
if (state == "COMPLETED"):
break
if (state == "FAILED") or (state == "TERMINATED") or (state == "CANCELLED"):
print >> sys.stderr, ("EMR job failed! Message = %s!") % (state)
sys.exit(1)
time.sleep (60)
# Create your next job here and add it to the EMR cluster
step = StreamingStep(...)
conn.add_jobflow_steps(jobid, [step])
# Repeat :)
to keep the machine alive start an interactive pig session. Then the machine won't shut down. You can then execute your map/reduce logic from the command line using:
cat infile.txt | yourMapper | sort | yourReducer > outfile.txt
精彩评论