Hadoop job taking input files from multiple directories
I have to feed all these files into one Map job. From what I see , for using MultipleFileInputFormat all input files need to be in same directory . Is it possible to pass multiple directories directly into the job?
If not , then is it possible to efficiently put these files into one directory without naming conflict or to merge these files into 1 single compressed gz file. Note: I am using plain java to implement the Mapper and not using Pig or hadoop streaming.Any help regarding the above issue will be deeply appreciated.
Thanks, AnkitFileInputFormat.addInputPaths() can take a comma separated list of multiple files, like
FileInputFormat.addInputPaths("foo/file1.gz,bar/file2.gz")
精彩评论