Is it possible to build this type of program in PHP?

2023-02-02 12:18 问答作者：

I want to build a QA program that will crawl all the pages of a site (all files under a specified domain name), and it will return all external links on the site that doesn't open in a new window (does not have the target="_blank" attribute in the href).

I can make a php or javascript to open external links in new windows or to report all problem links that don't open in new windows of a single page (the same page the script is in) but what I want is for the QA tool to go and search all pages of a website and report back to me what it finds.

This "spidering" i开发者_运维技巧s what I have no idea how to do, and am not sure if it's even possible to do with a language like PHP. If it's possible how can I go about it?

Yes, it is. You can use any function like fopen/fread or even file_get_contents to read the HTML of a given URL to a string, and then you can use DOMDocument::loadHTML to parse it, and DOMXPath to get a list of all <a> elements and their attributes (target, href).

yes its very much possible to do it using php.

try using curl to get the page and regex, more specifically preg_match_all function to filter the links

More on curl here: PHP: cURL - Manual More on regex here: PHP: preg_match_all - Manual

regex's are likely to fail / turn up false positives. Use PHP's DomDocument class and/or xpath to find links on a given page.

http://us.php.net/manual/en/book.dom.php http://php.net/manual/en/class.domxpath.php

http://www.phpclasses.org/package/5439-PHP-Crawl-a-site-and-retrieve-the-the-URL-of-all-links.html Provides a class to crawl / spider a site and retrieve the the URL of all links. You can modify the script to check, if the page is valid using curl or file_get_content (as mentioned above).

继续阅读：php

Is it possible to build this type of program in PHP?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？