I\'m tackling to crawler4j. http://code.google.com/p/crawler4j/ and simple test crawl a site was succeeded.
Closed. This question is seeking recommendations for books, tools, software libraries, and more. It does not meet Stack Overflow guidelines. It is not currently accepting answers.
I need a web spider to find certain links with regex. The spider would visit a list of websites, find links that match a regex pattern list, visit those matched links and repeat until the configured
I have some JS running on a page which pops up a modal localisation select box. I would like to prevent this from happening for bots /crawlers. Is the开发者_如何学Gore a way to do this using Modernizr
I am creating a new web crawler using C# to crawl some specific websites. every thing goes fine. but the problem is that some websites are blocking my crawler IP address after some requests. I tried u
I\'d like to do perform data mining on a large scale. For this, I need a fast crawler. All I need is something to download a web page, extract links and follow them recursively, but without visiting t
I\'m trying to get the contents of a webpage开发者_Go百科 as a string, and I found this question addressing how to write a basic web crawler, which claims to (and seems to) handle the encoding issue,
Update I use a FixedThreadPool already. What happens is that each thread open one connection for one site. What I want to do is something asynchronous.
Just to make things clear. I\'m trying to figure out how to build a website with a language ch开发者_JS百科ooser.
HTML5 allows us to update the current URL without refreshing the browser. I\'ve created a small framework on top of HTML5 which allows me to leverage this transparently, so I can do all requests using