Extract all URL from entire WebSite
I want to crawl a website using C# or VB.NET. I'd like the crawler to extract the URL from the webpage and I'd also like the crawler to follow U开发者_如何学编程RLs so I am able to extract all the URLs from the website.
How can I write this?
What is a website in this case?
A local virtual directory? A static web page? Dynamic pages hosted somewhere?
Look at
wget --mirror
Curl could have options here, too.
Also, please read up about robots.txt
before you start scraping the net :)
精彩评论