How do I prevent `hadoop fs rmr <uri>` from creating $folder$ files?

2023-02-27 13:25 问答作者：

We're using Amazon's Elastic Map Reduce to perform some large file processing jobs. As a part of our workflow, we occasionally need to remove files from S3 that may already exist. We do so using the hadoop fs interface, like this:

hadoop fs -rmr s3://mybucket/a/b/myfile.log

This removes the file from S3 appropriately, but in it's place leaves an empty file named "s3://mybucket/a/b_$folder$". As described in this question, Hadoop's Pig is unable to handle these files, so later steps in th开发者_JS百科e workflow can choke on this file.

(Note, it doesn't seem to matter whether we use -rmr or -rm or whether we use s3:// or s3n:// as the scheme: all of these exhibit the described behavior.)

How do I use the hadoop fs interface to remove files from S3 and be sure not to leave these troublesome files behind?

I wasn't able to figure out if it's possible to use the hadoop fs interface in this way. However, the s3cmd interface does the right thing (but only for one key at a time):

s3cmd del s3://mybucket/a/b/myfile.log

This requires configuring a ~/.s3cfg file with your AWS credentials first. s3cmd --configure will interactively help you create this file.

It is how the S3 suppot is implemented in Hadoop, see this: http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/s3native/NativeS3FileSystem.html.

So use s3cmd.

How do I prevent `hadoop fs rmr <uri>` from creating $folder$ files?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？