Hadoop: High CPU load on client side after committing jobs
I couldn't find an answer to my issue while sifting through some Hadoop guides: I am committing various Hadoop jobs (up to 200) in one go via a shell script on a client computer. Each job is started by means of a JAR (which is quite large; approx. 150 MB). Right after submitting the jobs, the client machine has a very high CPU load (each core on 100%) and the RAM is getting full quite fast. That way, the client is no longer usable. I thought that the computation of each job is entirely done within the Hadoop framework, and only some status information is exchanged between the 开发者_如何学JAVAcluster and the client while the job is running.
So, why is the client fully stretched? Am I committing Hadoop jobs the wrong way? Is each JAR too big?
Thanks in advance.
It is not about the jar. The client side is calculating the InputSplits
.
So it can be possible that when having large number of input files for each job the client machine gets a lot of load.
But I guess when submitting 200 jobs the RPC Handler on the jobtracker has some problems. How many RPC handlers are active on the jobtracker?
Anyways, I would batch the submission up to 10 or 20 jobs at a time and wait for their completion. I guess you're having the default FIFO scheduler? So you won't benefit from submitting all 200 jobs at a time either.
精彩评论