开发者

Is there a way to find all the pages' link by a URL?

If I have a link say http://开发者_StackOverflowyahoo.com/ so can I get the links inside yahoo? For example, I have a website http://umair.com/ and I know there are just 5 pages Home, About, Portfolio, FAQ, Contact so can I get links as follows programmatically?

http://umair.com/index.html
http://umair.com/about.html
http://umair.com/portfolio.html
http://umair.com/faq.html
http://umair.com/contact.html


Define what you mean by "links inside yahoo".

Do you mean all pages for which there is a link to on the page returned by "http://www.yahoo.com"? If so, you could read the HTML returned by an HTTP GET request, and parse through it looking for <a> elements. You could use the "HTML Agility Pack" for help.

If you mean, "All pages on the server at that domain", probably not. Most websites define a default page which you get when you don't explicitly request one. (for example, requesting http://umair.com almost certainly returns http://umair.com/index.html). Very few website don't define a default, and they will return a list of files.

If you mean, "All pages on the server at that domain, even if they define a default page", no that cannot be done. It would be an extreme breach of security.


This could be done by a Web Crawler, read some basic information about it:

http://en.wikipedia.org/wiki/Web_crawler

Includes Open Source crawlers, see if any of them is what you are looking for.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜