Count Hyperlinks of a Website [duplicate]
开发者_StackOverflow中文版Possible Duplicate:
How to parse HTML with PHP?
i want to write a php-program that count all hyperlinks of a website, the user can enter.
how to do this? is there a libary or something which i can parse and analyze the html about the hyperlinks?
thanks for your help
Like this
<?php
$site = file_get_contents("someurl");
$links = substr_count($site, "<a href=");
print"There is {$links} in that page.";
?>
Well, we won't be able to give you a finite answer but only pointers. I've done a search engine once out of php so the principle will be the same:
- First of all you need to code your script as a console script, a web script is not really appropriate but it's all a question of tastes
- You need to understand how to work with sockets in PHP and make requests, look at the php socket library at: http://www.php.net/manual/ref.network.php
- You will need to get versed in the world of HTTP requests, learn how to make your own GET/POST requests and split the headers from the returned content.
- Last part will be easy with regexp, just preg_match the content for "#()*#i" (the last expression might be wrong, i didn't test it at all ok?)
- Loop the list of found hrefs, compare to already visited hrefs (remember to take into account wildcard GET params in your stuff) and then repeat the process to load all the pages of a site.
It IS HARD WORK... good luck
You may have to use CURL to fetech the contents of the webpage. Store that in a variable then parse it for hyperlinks. You might need regular expression for that.
精彩评论