Adding a new volume to a pseudo-distributed Hadoop node failing silently

2023-02-13 09:01 问答作者：

Im attempting to add a new volume to a Hadoop pseudo-distributed node, by adding the location of the volume in dfs.name.dir in hdfs-site.xml, and i can see the lock file in this location - but try as i might, it seems开发者_StackOverflow that when i load files (using hive) these locations are hardly used (even though the lock files, and some sub-folders appears.. so Hadoop clearly had access to them). When the main volume comes close to running out of space, i get the following exception:

Failed with exception java.io.IOException: File /tmp/hive-ubuntu/hive_2011-02-24_15-39-15_997_1889807000233475717/-ext-10000/test.csv could only be replicated to 0 nodes, instead of 1
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1417)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:643)

Any pointers on how to add new volumes to Hadoop ? FWIW im using EC2.

There are a few things you can do, according to the FAQ:

Manually copy files in HDFS to a new name, delete the old files, then rename the new files to be what they were originally.
Increase the replication factor temporarily, setting it back once blocks have balanced out between nodes.
Remove the full node, wait for its blocks to replicate to the other nodes, then bring it back up. This doesn't really help because your full node is still full when you bring it back online.
Run the rebalancer script on the head node.

I'd try running #4 first, then #2.

When adding new disks / capacity to a data node Hadoop does not guarantee that the disks will be load balanced fairly (Ex: It will not put more blocks on drives with more free space). The best way I have solved this is to increase the replication factor (Ex: From 2 to 3).

hadoop fs -setrep 3 -R /<path>

Watch the 'under replicated blocks' report on the name node. As soon as this reaches 0, decrease the replication factor (Ex: From 3 to 2). This will randomly delete replicas from the system which should balance out the local node.

hadoop fs -setrep 2 -R /<path>

It's not going to be 100% balanced, but it should be in a lot better shape then it was before. This is covered in the Hadoop wiki to some extent. If you are running pseudo-distributed, and have no other data nodes then the balancer script will not help you.

http://wiki.apache.org/hadoop/FAQ#If_I_add_new_DataNodes_to_the_cluster_will_HDFS_move_the_blocks_to_the_newly_added_nodes_in_order_to_balance_disk_space_utilization_between_the_nodes.3F

继续阅读：hive

Adding a new volume to a pseudo-distributed Hadoop node failing silently

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？