开发者

How do extract urls from this page

I'm trying to use curl to get some data from the web. What I have is a url like somewebsite.com. On this website, there's a whole bunch of <divs> that have a class="control-element" and have this markup:

<div class="control-element">
   <a href="http://someurl.com/and/some/path">Anchor Text</a>
</div>

How should I extract the url and the anchor tex开发者_JS百科t for each of these links? Should I be using regex for this? or what's the best way to do it?


I think in this particular case you could be just fine using file_get_contents() instead of cURL.

For html parsing take a look at Simple HTML DOM.

If you don't want to use any 3-rd party libraries, here is an example using regex:

$doc = file_get_contents("http://someurl.com/");
preg_match_all('/<div class="control-element">(.*)<\/div>/isU', $doc, $matches);
$co = count($matches[1]);
for($i = 0; $i<$co;$i++)
{
    preg_match_all('/<a href="(.*)">(.*)<\/a>/isU', $matches[1][$i], $matches2);
    echo("URL: ".$matches2[1][0]." Anchor: ".$matches2[2][0]."<br>");
}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜