开发者

What jars from Nutch do i need to write my own Crawl.java

I am trying to write my own version of Crawl.java from Nutch where I'd do a little different stuff. I don't want to work with Nutch source code. I just want to cleanly import a few jars and get going with my application. 开发者_Python百科How should i provide conf/crawl-urlfilter.txt and other required conf files?

Could someone help me here? Thanks


One simple way is to package your code in a jar. Be sure to include a main in one of the class that starts your crawling. Drop that jar file in the lib folder of your Nutch installation. You can now start your crawling with a command like (assuming that your PATH is correctly set to find the nutch command):

nutch com.xyz.YourCrawlerMain

where "com.xyz.YourCrawlerMain" represents your main class to launch your crawling.

This will launch your crawler with the Nutch classpath correctly set.

For the configuration files, just update them directly in the conf folder of your Nutch installation.

UPDATE

I'm working on something similar and I am able to make nutch work from my app with these settings: set your classpath to include the Nutch folder (so it can find the plugins), the Nutch/conf folder and include all jars from Nutch/lib + nutch.jar from the nutch folder.

But beware if your app is running in a web container. I had to mess with the classpath to make it works...

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜