Is there a way to find all the pages' link by a URL?

2023-01-14 09:01 问答作者：

If I have a link say http://开发者_StackOverflowyahoo.com/ so can I get the links inside yahoo? For example, I have a website http://umair.com/ and I know there are just 5 pages Home, About, Portfolio, FAQ, Contact so can I get links as follows programmatically?

http://umair.com/index.html
http://umair.com/about.html
http://umair.com/portfolio.html
http://umair.com/faq.html
http://umair.com/contact.html

Define what you mean by "links inside yahoo".

Do you mean all pages for which there is a link to on the page returned by "http://www.yahoo.com"? If so, you could read the HTML returned by an HTTP GET request, and parse through it looking for <a> elements. You could use the "HTML Agility Pack" for help.

If you mean, "All pages on the server at that domain", probably not. Most websites define a default page which you get when you don't explicitly request one. (for example, requesting http://umair.com almost certainly returns http://umair.com/index.html). Very few website don't define a default, and they will return a list of files.

If you mean, "All pages on the server at that domain, even if they define a default page", no that cannot be done. It would be an extreme breach of security.

This could be done by a Web Crawler, read some basic information about it:

http://en.wikipedia.org/wiki/Web_crawler

Includes Open Source crawlers, see if any of them is what you are looking for.

继续阅读：hyperlink scraper

Is there a way to find all the pages' link by a URL?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？