开发者

Mahout/Hadoop: SQL to SequenceFile

I am starting to use Mahout for clustering, but I am having a hard time trying to convert a sql(mysql) dump to a mahout-compatible SequenceFile. I am using the code above.

SQL Sample

(1, 318145, '[running with jentopia, sotm]', '2011-04-27 21:47:16'),
(2, 318138, '[fonts, textile, valentines day]', '2011-04-27 21:47:16'),
...

Java

    File url = new File(inputFile);

    // starts the conf
    Configuration conf = new Configuration();

    // opens a buffer to save file
    Job job = new Job(conf);
    job.setJobName("Convert Text");
    job.setJarByClass(Mapper.class);

    job.setMapperClass(Mapper.class);
    job.setReducerClass(Reducer.class);

    job.setNumReduceTasks(0);

    job.setOutputKeyClass(LongWritable.class);
    job.setOutputValueClass(Text.class);

    job.setOutputFormatClass(SequenceFileOutputFormat.class);
    job.setInputFormatClass(TextInputFormat.class);

    TextInputFormat.addInputPath(job, new Path(inputFile));
    SequenceFileOutputFormat.setOutputPath(job, new Path(SequenceFileCreator.SEQUENCE_FOLDER_PATH));

    // submit and wait f开发者_StackOverflowor completion
    job.waitForCompletion(true);

Thanks!

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜