开发者

Extracting the Anchor Text from the RSS

Folks,

I tired all my PHP skills to extract domain name s开发者_开发知识库trings from a RSS Feed and put each domain name as an array element, but all in vain:

Here is the RSS: http://bulliesatwork.co.uk/master/dev/domp/expdom/domains.php

Do you see a list of domain names, which are anchored? All I need is to extract these domain names like "abc.co uk", (there is a space between .co and .uk), which can be removed with str_replace).

Here is my first try: (Using SimpleHTMLDomParser)

require_once('simple_html_dom.php');

$html = file_get_html('http://bulliesatwork.co.uk/master/dev/domp/expdom/domains.php');

$domains = $html->find('div[class="entry"] a', 0);

foreach($domains as $dom)
{        
    echo str_replace(' ', '.', $dom->plaintext);
} 

$html->clear();
unset($html);

Here is my another try with DOM Document:

$scrapeurl = 'http://bulliesatwork.co.uk/master/dev/domp/expdom/domains.php';         

$keywords = file_get_contents($scrapeurl);

$keywords = json_decode($keywords);

foreach( $keywords->responseData->results as $keyword) 
{    
    echo str_replace("...",".",$keyword->title).'<br/>';  
}

In both the cases, DOMDocument is created but it seems the Document has all information except the domain names I want to extract.

Please help me out to extract the domain names.

Cheers.


Try this:

$xmlobj=simplexml_load_string(file_get_contents("http://bulliesatwork.co.uk/master/dev/domp/expdom/domains.php"));

$res = $xmlobj->xpath("/rss/channel/item/title");
$names = array();
while(list( , $node) = each($res)) {
  $names[] = (string)$node;
 }

$names has all the names you want: you'll need to do the string replace yourself.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜