开发者

Moving files in Hadoop using the Java API?

I want to mo开发者_运维技巧ve files around in HDFS using the Java APIs. I cannot figure out a way to do this. The FileSystem class only seems to want to allow moving to and from the local file system.. but I want to keep them in HDFS and move them there.

Am I missing something basic? The only way I can figure to do it is to read it from the input stream and write it back out... and then delete the old copy (yuck).

thanks


Use FileSystem.rename():

public abstract boolean rename(Path src, Path dst) throws IOException

Renames Path src to Path dst. Can take place on local fs or remote DFS.

Parameters:
src - path to be renamed
dst - new path after rename
Returns:
true if rename is successful
Throws:
IOException - on failure


The java.nio.* approach may not work on HDFS always. So found the following solution that works.

Move files from one directory to another using org.apache.hadoop.fs.FileUtil.copy API

val fs = FileSystem.get(new Configuration())
        val conf = new org.apache.hadoop.conf.Configuration()
        val srcFs = FileSystem.get(new org.apache.hadoop.conf.Configuration())
        val dstFs = FileSystem.get(new org.apache.hadoop.conf.Configuration())
        val dstPath = new org.apache.hadoop.fs.Path(DEST_FILE_DIR)

        for (file <- fileList) {
          // The 5th parameter indicates whether source should be deleted or not
          FileUtil.copy(srcFs, file, dstFs, dstPath, true, conf)


I think the FileUtilts replaceFile would also solve the purpose. http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/fs/FileUtil.html#replaceFile(java.io.File, java.io.File)


hdfsDirectory="hdfs://srcPath"   
 val conf = new org.apache.hadoop.conf.Configuration()
        val src:Path = new org.apache.hadoop.fs.Path(hdfsDirectory)
        val fs = FileSystem.get(src.toUri,conf)
        val srcPath: Path = new Path("hdfs://srcPath")
        val srcFs =FileSystem.get(srcPath.toUri,conf)
        val dstPath:Path =new Path("hdfs://targetPath/")
        val dstFs =FileSystem.get(dstPath.toUri,conf)
        val exists = fs.exists(new org.apache.hadoop.fs.Path(hdfsDirectory))
        val status:Array[FileStatus] = fs.listStatus(new Path(hdfsDirectory))
        if (status.length>0) {
          status.foreach(x => {
            println("My files: " + x.getPath)
            FileUtil.copy(srcFs, x.getPath, dstFs, dstPath, true, conf)
            println("Files moved !!" +x.getPath)
          }
          )}
        else{
          println("No Files Found !!")
        }
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜