开发者

Problems parsing Facebook atom feed using PHP

I have problems parsing a atom facebook feed.

Using:

PHP's DOMDocument.

Errors:

Warning: DOMDocument::loadXML(): xmlParseEntityRef: no name in Entity, line: 12 Warning: DOMDocument::loadXML(): xmlParseEntityRef: no name in Entity, line: 12 Warning: DOMDocument::loadXML(): Entity 'euro' not defined in Entity, line: 16 Warning: DOMDocument::loadXML(): Entity 'acute' not defined in Entity, line: 16 Warning: DOMDocument::loadXML(): Entity 'euro' not defined in Entity, line: 16 Warning: DOMDocument::loadXML(): Entity 'acute' not defined in Entity, line: 16 Notice: Trying to get property of non-object in ... on line 76

Unfortunately, the entities mentioned above cannot be found in feed code. Not that simple... The problem has to be something different. Other feeds can be parsed using the same code without any problems. So, I think the problem is something in Facebook's HTML inside the content tag. What could this be? How to solve?

    <content type="html">&lt;div class=&quot;ext_media clearfix has_extra has_thumb&quot;&gt;&lt;div class=&quot;title&quot;&gt;&lt;a href=&quot;http://www.youtube.com/watch?v=BPq58p0K6DM&amp;feature=youtu.be&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot; title=&quot;http://www.youtube.com/watch?v=BPq58p0K6DM&amp;amp;feature=youtu.be&quot; onmousedown=&quot;UntrustedLink.bootstrap($(this), &quot;-AQBiGfHA&quot;, event, bagof(null));&quot;&gt;Did you know there were this many satellites in orbit VIDEO&lt;/a&gt;&lt;/div&gt;&lt;div class=&quot;url&quot;&gt;Quelle: www.youtube.com&lt;/div&gt;&lt;div class=&quot;story_posted_item clearfix&quot;&gt;&lt;div class=&quot;extra&quot;&gt;&lt;div class=&quot;share_thumb&quot;&gt;&lt;a href=&quot;http://www.youtube.com/watch?v=BPq58p0K6DM&amp;feature=youtu.be&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot; onmousedown=&quot;UntrustedLink.bootstrap($(this), &quot;2AQBCjOTV&quot;, event, bagof(null));&quot;&gt;&lt;img class=&quot;img_loading img&quot; src=&quot;http://i3.ytimg.com/vi/BPq58p0K6DM/default.jpg&quot; alt=&quot;&quot; onload=&quot;var img = this; onloadRegister(function() { adjustImage(img); });&quot; id=&quot;share_thumb_257759307568958&quot; /&gt;&lt;/a&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;</content>

See the full feed code here. (See feed in browser.)

Full PHP code that I am using:

    $feed_xml_str = ...;

print '<pre>';
print_r( xmlstr_to_array($feed_xml_str) );
print '</pre>';

function xmlstr_to_array($xmlstr) {
    $doc = new DOMDocument();
    $doc->loadXML($xmlstr);
    return domnode_to_array($doc->documentElement);
}
function domnode_to_array($node) {
    $output = array();
    switch ($node->nodeType) {
        case XML_CDATA_SECTION_NODE:
        case XML_TEXT_NODE:
            $output = trim($node->textContent);
            break;
        case XML_ELEMENT_NODE:
            for ($i=0, $m=$node->childNodes->length; $i<$m; $i++) {
                $child = $node->childNodes->item($i);
                $v = domnode_to_array($child);
                if(isset($child->tagName)) {
                    $t = $child->tagName;
                    if(!isset($output[$t])) {
                        $output[$t] = array();
                    }
                    $output[$t][] = $v;
                }
                elseif($v) {

                    // >>>>> WJ: OUT COMMENTED CODE >>>>>
                    //$output = (string) $v;
                    // >>>>> WJ: ADDED CODE >>>>>
                    if($node->attributes->length) {
                        $a = array();
                        foreach($node->attributes as $attrName => $attrNode) {
                            $a[$attrName] = (string) $attrNode->value;
                        }
                        $output['@attributes'] = $a;
                        $output['@value'] = (string) $v;
                    }
                    else
                        $output = (string) $v;
  开发者_C百科                  // >>>>> WJ: MODIFIED CODE END >>>>>

                }
            }
            if(is_array($output)) {
                if($node->attributes->length) {
                    $a = array();
                    foreach($node->attributes as $attrName => $attrNode) {
                        $a[$attrName] = (string) $attrNode->value;
                    }
                    $output['@attributes'] = $a;
                }
                foreach ($output as $t => $v) {
                    if(is_array($v) && count($v)==1 && $t!='@attributes') {
                        $output[$t] = $v[0];
                    }
                }
            }
            break;
    }
    return $output;
}


Facebook is sniffing user-agents, and won't give you the XML feed you see in a browser without giving one. You can handle this several ways before fetching the XML from their server:

ini_set("user_agent","my_awesome_magic_user_agent_which_can_be_anyhing");

Or:

stream_context_set_default(
     array(
        "http"=>array(
           "user_agent"=>"whatever"  
         )
     ));

Next time, you might want to echo your XML string to see what is really going on....

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜