I\'m working on a crawler and need to understand exactly what is meant by \"link depth\". Take nutch for example: http://wiki.apache.org/nutch/NutchTutorial
I need to write a crawler to extract some info from few pre-slected websites only. I know this is a straightway job but am thinking of using google app engine to get this done.
I\'m working on a project where I need a mature crawler to do some work, and I\'m evaluating Nutch for this purpose.
I\'m trying to run Nutch on my Windows machine. I have Nutch, Java, Tomcat, and Cygwin installed. When I try to run the crawl command in Cygwin, I get the following error:
Can I use a MapReduce framework to create an index and somehow add it to a distributed Solr? I have a burst of information (logfiles and documents) that will be transported over the internet and stor
We\'re about to start a project consisting of a search engine website. We need to implement a site that has social functionalities upon it\'s core search engine solution. Obviously, we need to choose
I configured nutch with the following in my conf/nutch-site.xml <prope开发者_如何学Pythonrty>
This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time,or an extraordinarily narrow situation that is not generally applic
I\'m trying to profile Nutch using VisualVM.Lucene is the part of the Nutch core responsible for ge开发者_StackOverflow社区nerating url indexes and for searching these indexes due to some query.I\'m r
does apache-nutch support sitemaps? o开发者_JAVA技巧r how can i implement it myself? how can i use priority field, should it be multiplied to boost field?Not that I\'m aware of.