开发者

How do I get the link element in a html page with PHP

First, I know that I can get the HTML of a webpage with:

file_get_contents($url);

What I am trying to do is get a specific link element in the page (found in the head).

e.g:

<link type="text/plain" rel="service" href="/service.txt" /> (the element could close with just >)

My question is: How can I get that specific element with the "rel" attribute equal to "service" so I 开发者_JS百科can get the href?

My second question is: Should I also get the "base" element? Does it apply to the "link" element? I am trying to follow the standard.

Also, the html might have errors. I don't have control on how my users code there stuff.


Using PHP's DOMDocument, this should do it (untested):

$doc = new DOMDocument();
$doc->loadHTML($file);
$head = $doc->getElementsByTagName('head')->item(0);
$links = $head->getElementsByTagName("link");
foreach($links as $l) {
    if($l->getAttribute("rel") == "service") {
        echo $l->getAttribute("href");
    }
}


You should get the Base element, but know how it works and its scope.

In truth, when I have to screen-scrape, I use phpquery. This is an older PHP port of jQuery... and what that may sound like something of a dumb concept, it is awesome for document traversal... and doesn't require well-formed XHTMl.

http://code.google.com/p/phpquery/


I'm working with Selenium under Java for Web-Application-Testing. It provides very nice features for document traversal using CSS-Selectors.

Have a look at How to use Selenium with PHP.
But this setup might be to complex for your needs if you only want to extract this one link.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜