Cannot show the downloaded webpage with proper encoding using PHP
I have to get the content of a persian page and show a part of that page to some users. The problem is after I filter the page content I cannot show the content with the proper encoding. The webpage is located at sena.ir and here is the screen shot of the original webpage part I want to show:
alt text http://img502.imageshack.us/img502/983/original.gif
And here is what I got:
alt text http://www.freeimagehosting.net/uploads/812cebe6b3.gif
Here is the function I use to get the page content:
function getPage($url, $referer="", $timeout="", $header=""){
    if(!isset($timeout))
        $timeout=30;
    $curl = curl_init();
    if(strstr($referer,"://")){
        curl_setopt ($curl, CURLOPT_REFERER, $referer);
    }
    $headers [] = 'Accept: image/gif, image/x-bitmap, image/jpeg, image/pjpeg';
    $headers [] = 'Connection: Keep-Alive';
    $headers [] = 'Content-type: application/x-www-form-urlencoded;开发者_开发知识库charset=utf-8 '; // I Tried iso-..... as well but no chance
    $user_agent = 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.0.3705; .NET CLR 1.1.4322; Media Center PC 4.0)';
    $compression = "gzip";
    curl_setopt ($curl, CURLOPT_HTTPHEADER, $headers );
    curl_setopt ($curl, CURLOPT_HEADER, 0 );
    curl_setopt ($curl, CURLOPT_USERAGENT, $user_agent );
    curl_setopt ($curl, CURLOPT_RETURNTRANSFER, 1 );
    curl_setopt ($curl, CURLOPT_FOLLOWLOCATION, 1 );
    curl_setopt ($curl, CURLOPT_POST, 0 );
    curl_setopt ($curl, CURLOPT_ENCODING, $compression );
    curl_setopt ($curl, CURLOPT_TIMEOUT, 300 );
    curl_setopt ($curl, CURLOPT_SSL_VERIFYHOST, 0 );
    curl_setopt ($curl, CURLOPT_SSL_VERIFYPEER, 0 );
    curl_setopt ($curl, CURLOPT_URL, $url);
    $html = curl_exec ($curl);
    curl_close ($curl);
    return $html;
}
$content = getPage("http://sena.ir/");
$p1 = strpos($content,'<TABLE cellSpacing="3" cellPadding="3" width="100%" border="0">');
$p2 = strpos($content,"</TABLE>",$p1);
$content = substr($content, $p1, $p2-$p1);
echo $content;
Data was not the problem. The output was the problem. Since the proxy like function removes the headers of the html and encoding declerations you have to add these lines before you output the filtered data:
<html lang="fa"> 
<head> 
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"> 
 
         加载中,请稍侯......
 加载中,请稍侯......
      
精彩评论