开发者

Extract all URL from entire WebSite

I want to crawl a website using C# or VB.NET. I'd like the crawler to extract the URL from the webpage and I'd also like the crawler to follow U开发者_如何学编程RLs so I am able to extract all the URLs from the website.

How can I write this?


What is a website in this case?

A local virtual directory? A static web page? Dynamic pages hosted somewhere?

Look at

wget --mirror

Curl could have options here, too.

Also, please read up about robots.txt before you start scraping the net :)

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜