开发者

Extract entire url content using Regex

Okay, I am using (PHP) file_get_contents to read some websites, these sites have only one link for facebook... after I get the entire site I will like to find the complete Url for facebook

So in some part there will be:

<a href="http://facebook.com/username" >

I wanna get http://facebook.com/username, I mean from the first (") to the last ("). Username is variable... could be username.somethingelse and I could have some attributes before or after "href".

Just in case i am not being very clear:

<a href="http://facebook.com/username" >  //I want http://facebook.com/username
<a href="http://www.facebook.com/username" >  //I want http://www.facebook.com/username
<a class="value" href="http://facebook.com/username. some" attr="value" >  //I want http://facebook.com/username. some

or all example above, could be with singles quotes

<a href='http://facebook.com/username' > //I want http://facebook.com/username

Thanks t开发者_如何学Co all


Don't use regex on HTML. It's a shotgun that'll blow off your leg at some point. Use DOM instead:

$dom = new DOMDocument;
$dom->loadHTML(...);
$xp = new DOMXPath($dom);

$a_tags = $xp->query("//a");
foreach($a_tags as $a) {
   echo $a->getAttribute('href');
}


I would suggest using DOMDocument for this very purpose rather than using regex. Here is a quick code sample for your case:

$dom = new DOMDocument();
$dom->loadHTML($content);

// To hold all your links...
$links = array();

$hrefTags = $dom->getElementsByTagName("a");
    foreach ($hrefTags as $hrefTag)
       $links[] = $hrefTag->getAttribute("href");

print_r($links); // dump all links
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜