nutch + mysql integration
When nutch finishes its cycle (that is crawl - fetch- parse - index) during index phase, I do not want nutch to index (lucene index), but I want nutch to place all the crawled data (I believe he keeps them as NutchDocument object) into mysql using my code.
Is there any way to do t开发者_如何学Pythonhis?
Thanks
Create your own java class that manage the Nutch cycle. It should be similar to org.apache.nutch.crawl.Crawl but you will have to replace the call to the indexer by a call to your Mysql connector. Or you can call your Mysql connector during each cycle depending on whether you want to update Mysql at the end of the crawl or while it is happening.
精彩评论