Errors for running Mahout example
I downloaded the examples of latest version for chapter 09 of “Mahout in Action”. I can successfully run several examples, but for three files, NewsKMeansClustering.java, ReutersToSparseVectors.java, and NewsFuzzyKMeansClusteing.java. Running these three programs gives similar error messages:
Aug 3, 2011 2:03:54 PM org.apache.hadoop.metrics.jvm.JvmMetrics init INFO: Initializing JVM Metrics with processName=JobTracker, sessionId=
Aug 3, 2011 2:03:54 PM org.apache.hadoop.mapred.JobClient configureCommandLineOptions WARNING: Use GenericOptionsParser for parsing the arguments. Applications should
implement Tool 开发者_运维百科for the same.Aug 3, 2011 2:03:54 PM org.apache.hadoop.mapred.JobClient configureCommandLineOptions WARNING: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
Exception in thread "main" org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/home/user1/workspaceMahout1/recommender/inputDir
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:224)
at org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:55)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:241)
at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:885) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:779) at org.apache.hadoop.mapreduce.Job.submit(Job.java:432) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:447)
at org.apache.mahout.vectorizer.DocumentProcessor.tokenizeDocuments(DocumentProcessor.java:93) at mia.clustering.ch09.NewsKMeansClustering.main(NewsKMeansClustering.java:54)
For the above messages, I do not quite understand what do those two warnings mean? Moreover, it looks like that “input path” should have been created, how can I create this type of input? Thanks.
You can ignore the warnings. The error is that the input directory you have specified does not exist. Does it exist? What is your command line?
I ran into a similar mismatch. The MiA files at https://github.com/tdunning/MiA have some cases where a .csv file is left in the same dir as the Java source. For example https://github.com/tdunning/MiA/tree/master/src/main/java/mia/recommender/ch02 ... however via Eclipse, loading it using DataModel model = new FileDataModel(new File("intro.csv")); ...doesn't find it.
Adding
System.out.println("CWD: "+System.getProperty("user.dir"));
...will reveal where Eclipse is looking (in my case, a couple levels up the filetree, but this might vary depending on how exactly you've set things up).
精彩评论