开发者

Will Nutch, the spider, index webpages it already has in it's index?

Does Nutch index pages again if they're already in 开发者_运维技巧the index? If so, how do I change this?


Yes and no. By default Nutch will reindex pages only after a certain period 1 month (from memory), if the page hasn't change it will delay increase the re-indexing time too a maximum which is 3 month by default. All settings are configurable in nutch-site.xml

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜