开发者

Regular expression to match links containing "Google"

I want to use PHP regular expressions to match out all the links which contain the word google. I've tried this:

$url = "http://www.google.com";
$html = file_get_contents($url); 
preg_match_all('/<a.*(.*?)".*>(.*google.*?)<\/a>/i',$htm开发者_运维百科l,$links);
echo '<pre />';
print_r($links); // it should return 2 links 'About Google' & 'Go to Google English'

However it returns nothing. Why?


Better is to use XPath here:

$url="http://www.google.com";
$html=file_get_contents($url);

$doc = new DOMDocument;
$doc->loadHTML($html);

$xpath = new DOMXPath($doc);
$query = "//a[contains(translate(text(), 'GOOGLE', 'google'), 'google')]";
// or just:
// $query = "//a[contains(text(),'Google')]";
$links = $xpath->query($query);

$links will be a DOMNodeList you can iterate.


You should use a dom parser, because using regex for html documents can be "painfully" error prone. Try something like this

//Disable displaying errors
libxml_use_internal_errors(TRUE);

$url="http://www.google.com";
$html=file_get_contents($url); 


$doc = new DOMDocument();
$doc->loadHTML($html);
$n=0;
foreach ($doc->getElementsByTagName('a') as $a) {
    //check if anchor contains the word 'google' and print it out
    if ($a->hasAttribute('href')  && strpos($a->getAttribute('href'),'google') ) {
        echo "Anchor" . ++$n . ': '. $a->getAttribute('href') . '<br>';
    }
}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜