开发者

nutch crawl path

I would like to know how to make nutch crawl not only the domain that I specified, but also the dir path within 开发者_StackOverflowthe domain that I specified. I know that you can configure this information on regex-urlfilter.txt


This should crawl only the domain/path you want :

+.*www\.domain\.com/yourpath/.*  
#skip everything else  
-.*
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜