开发者

php/curl not returning correct data

Here's a small sample of some test code that simply goes to

http://www.un.org/apps/news/story.asp?NewsID=37180&Cr=Haiti&Cr1=

and pulls in the specified web page.

<?php
    $url = "http://www.un.org/apps/news/story.asp?NewsID=37180&Cr=Haiti&Cr1=";
    $curl = curl_init();    // initialize curl handle
    curl_setopt($curl, CURLOPT_URL, $url); // set url to post to
    curl_setopt($curl, CURLOPT_FAILONERROR, 1);
    curl_setopt($curl, CURLOPT_COOKIESESSION, TRUE); // since we reuse now
    curl_setopt($curl, CURLOPT_FOLLOWLOCATION, 1);// allow redirects
    curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1); // return into a variable
    curl_setopt($curl, CURLOPT_TIMEOUT, 20); // times out after 20 seconds
    curl_setopt($curl, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 5.1; U; en; rv:1.8.0) Gecko/20060728 Firefox/1.5.0" );
    $result = curl_exec($curl); // run the whole process
    print $result;

When I开发者_Go百科 look at the result, however, it's not quite what I'm wanting. If you look in the results for the string

"United Nations humanitarian officials are calling for ?massive mobilization activities? in Haiti"

you can see the two question marks surrounding the text "massive mobilization activities".

If you go to the actual website, the question marks are rendered as a pair of left- and right- quotation marks, and this is reflected when you view the source code from the site ...

"United Nations humanitarian officials are calling for “massive mobilization activities” in Haiti"

I'd like to know how I can grab the double quotes rather than the question marks that I'm seeing.

All suggestions gratefully accepted.

And happy new year to y'all


Nothing to do with PHP, nothing to do with curl, not even an error. Those "question marks" you mention are ASCII characters 0x93 and 0x94: the open double quotes and the close double quotes. I'm not a PHP guy but if you want regular double quotes

str_replace(array(chr(0x93), chr(0x94)),'"',$result) 

should fix you right up.


It looks like that " used in above example is a special character rather than normal ". view page source and copy past source into notepad if it shows you ? instead of " it means it is a special character, and you need to figure out the exact code for that character

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜