nutch and sitemap.xml
does apache-nutch support sitemaps? o开发者_JAVA技巧r how can i implement it myself? how can i use priority field, should it be multiplied to boost field?
Not that I'm aware of. Depending on the behaviour you expect their are multiple implementations, can u be more specific? For instance: + you can make it that new sitemaps submitted are 'injected' whith a high score so they will get crawled earlier. For this just add an inject command before starting a new crawl/fetch/index cycle + you can create a scoring plug-in which will boost URL found in a sitemaps... But you can not define recrawl periods at a URL level, as the sitemap would indicate. Nutch has build-in fonction which will recrawl more often URL that changes more an vice-versa. However you could decide to boost score of URL with frequent refresh rate, so that they get crawled earlier...
I guess they support it now. I found it on this link
https://wiki.apache.org/nutch/SitemapFeature
精彩评论