Is there a way to find all the pages' link by a URL?
If I have a link say http://开发者_StackOverflowyahoo.com/ so can I get the links inside yahoo? For example, I have a website http://umair.com/ and I know there are just 5 pages Home, About, Portfolio, FAQ, Contact so can I get links as follows programmatically?
http://umair.com/index.html
http://umair.com/about.html
http://umair.com/portfolio.html
http://umair.com/faq.html
http://umair.com/contact.html
Define what you mean by "links inside yahoo".
Do you mean all pages for which there is a link to on the page returned by "http://www.yahoo.com"? If so, you could read the HTML returned by an HTTP GET request, and parse through it looking for <a>
elements. You could use the "HTML Agility Pack" for help.
If you mean, "All pages on the server at that domain", probably not. Most websites define a default page which you get when you don't explicitly request one. (for example, requesting http://umair.com almost certainly returns http://umair.com/index.html). Very few website don't define a default, and they will return a list of files.
If you mean, "All pages on the server at that domain, even if they define a default page", no that cannot be done. It would be an extreme breach of security.
This could be done by a Web Crawler, read some basic information about it:
http://en.wikipedia.org/wiki/Web_crawler
Includes Open Source crawlers, see if any of them is what you are looking for.
精彩评论