nutch crawl path
I would like to know how to make nutch crawl not only the domain that I specified, but also the dir path within 开发者_StackOverflowthe domain that I specified. I know that you can configure this information on regex-urlfilter.txt
This should crawl only the domain/path you want :
+.*www\.domain\.com/yourpath/.*
#skip everything else
-.*
精彩评论