开发者

Getting nutch to prioritize frequently updated pages?

Is there a way to get Nutch to increase the crawling of pages that gets updated frequently?

E.g. index pages and feeds.

It would also be of value to refresh fresh pages that contains comments more f开发者_运维技巧requently the first date after the page was created. Any tips are appreciated.


What you need is the Adaptive Fetch Schedule. I have written a blog post about how it works. Basically what this scheduler does is gradually makes the pages that change more often to be visited more and more regularly.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜