Nutch crawling with seeds urls are in range
Some site have url pattern as 开发者_JAVA百科www.___.com/id=1
to www.___.com/id=1000
. How can I crawl the site using nutch. Is there any wway to provide seed for fetching in range??
I think the easiest way would be to have a script to generate your initial list of urls.
no. you have inject them manually or using a script
精彩评论