Will Nutch, the spider, index webpages it already has in it's index?
Does Nutch index pages again if they're already in 开发者_运维技巧the index? If so, how do I change this?
Yes and no. By default Nutch will reindex pages only after a certain period 1 month (from memory), if the page hasn't change it will delay increase the re-indexing time too a maximum which is 3 month by default. All settings are configurable in nutch-site.xml
精彩评论