wrote code in java for nutch
hello: I'm writing code in java for nutch(open source search engine) to remove the movments from arabic words in the indexer. I don't know what is the error in it. Tthis is the code:
package com.mycompany.nutch.indexing;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.io.Text;
import org.apache.log4j.Logger;
import org.apache.nutch.crawl.CrawlDatum;
import org.apache.nutch.crawl.Inlinks;
import org.apache.nutch.indexer.IndexingException;
import org.apache.nutch.indexer.IndexingFilter;
import org.apache.nutch.indexer.NutchDocument;
import org.apache.nutch.parse.getData().parse.getData();
public class InvalidUrlIndexFilter implements IndexingFilter {
private static final Logger LOGGER =
Logger.getLogger(InvalidUrlIndexFilter.class);
private Configuration conf;
public void addIndexBackendOptions(Configuration conf) {
// NOOP
return;
}
public NutchDocument filter(开发者_StackOverflow社区NutchDocument doc, Parse parse, Text url,
CrawlDatum datum, Inlinks inlinks) throws IndexingException {
if (url == null) {
return null;
}
char[] parse.getData() = input.trim().toCharArray();
for(int p=0;p<parse.getData().length;p++)
if(!(parse.getData()[p]=='َ'||parse.getData()[p]=='ً'||parse.getData()[p]=='ُ'||parse.getData()[p]=='ِ'||parse.getData()[p]=='ٍ'||parse.getData()[p]=='ٌ' ||parse.getData()[p]=='ّ'||parse.getData()[p]=='ْ' ||parse.getData()[p]=='"' ))
new String.append(parse.getData()[p]);
return doc;
}
public Configuration getConf() {
return conf;
}
public void setConf(Configuration conf) {
this.conf = conf;
}
}
I think that the error is in using parse.getdata()
but I don't know what I should use instead of it?
The line
char[] parse.getData() = input.trim().toCharArray();
will give you a compile error because the left hand side is not a variable. Please replace parse.getData()
by a unique variable name (e.g. parsedData
) in this line and the following lines.
Second the import of
import org.apache.nutch.parse.getData().parse.getData();
will also fail. Looks a lot like a text replace issue.
精彩评论