开发者

Hadoop fully distributed mode

I am a newbie to Hadoop. I have managed to develop a simple Map/Reduce application that works fine in 'pseudo distributed mode'.I want to test that in 'fully distributed mode'. I have few questions regarding that;

  1. How many machines(nodes) do I need (minimum & recommended) for processing a file size of 1-10GB?
  2. what are the hardware requir开发者_运维问答ements(mainly, I want to know the # of cores, Memory space and disk space)?


I'd check out Cloudera's hardware recommendations: http://www.cloudera.com/blog/2010/03/clouderas-support-team-shares-some-basic-hardware-recommendations/

A snippet from that page

Various hardware configurations for different workloads, including our original “base” recommendation:

  • Light Processing Configuration (1U/machine): Two quad core CPUs, 8GB memory, and 4 disk drives (1TB or 2TB). Note that CPU-intensive work such as natural language processing involves loading large models into RAM before processing data and should be configured with 2GB RAM/core instead of 1GB RAM/core.
  • Balanced Compute Configuration (1U/machine): Two quad core CPUs, 16 to 24GB memory, and 4 disk drives (1TB or 2TB) directly attached using the motherboard controller. These are often available as twins with two motherboards and 8 drives in a single 2U cabinet.
  • Storage Heavy Configuration (2U/machine): Two quad core CPUs, 16 to 24GB memory, and 12 disk drives (1TB or 2TB). The power consumption for this type of machine starts around ~200W in idle state and can go as high as ~350W when active.
  • Compute Intensive Configuration (2U/machine): Two quad core CPUs, 48-72GB memory, and 8 disk drives (1TB or 2TB). These are often used when a combination of large in-memory models and heavy reference data caching is required.
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜