开发者

How to configure heritrix to log all encountered URLs including those which are filtered / not to crawl?

I'm using heritrix 3.1.1-snapshot to crawl / archive some website contents, I need to log all urls encountered in every page it processes, including those urls which are (configured) not to be crawled.

I've been searching for long time and havent gotten positive results 开发者_运维百科:( hope can get some helps here. thanks.


http://crawler.archive.org/articles/user_manual/config.html section 6.3.1.4 seems to answer your question.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜