How to design my mapper?
I have to write a mapreduce job but I do开发者_如何学Pythonnt know how to go about it,
I have jar MARD.jar through which I can instantiate MARD objects. Using which I call the mard.normalize file meathod on it i.e. mard.normaliseFile(bunch of arguments).
This inturn creates certain output file.
For the normalise meathod to run it needs a folder called myMard in the working directory. So I thought that I would give the myMard folder as the in input path to hadoop job, but m not sure if that would help beacuse mard.normaliseFile(bunch of arguments) will search for the myMard folder in the working directory but it will not find it as (**this is what I think) the Mapper will only be able to access the content of files through the "values" obtained from the fileSplit, it cannot give direct access to the files in the myMard folder.
In short I have to execute the follwing code through the MapReduce
File setupFolder = new File(setupFolderName);
setupFolder.mkdirs();
MARD mard = new MARD(setupFolder);
Text valuz = new Text();
IntWritable intval = new IntWritable();
File original = new File("Vca1652.txt");
File mardedxml = new File("Vca1652-mardedxml.txt");
File marded = new File("Vca1652-marded.txt");
mardedxml.createNewFile();
marded.createNewFile();
NormalisationStats stats;
try {
stats = mard.normaliseFile(original,mardedxml,marded,50.0);
//This meathod requires access to the myMardfolder
System.out.println(stats);
} catch (MARDException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
Please help
精彩评论