开发者

php file_put_contents asian character filename encoding

I'm trying to get this scrape images off of wikipedia. What good is free licensed media if you can't get it? Original script is here.

If you put this

http://upload.wikimedia.org/wikipedia/commons/2/26/%E7%9A%84-bw.png

in firefox, it will immediately be transformed into

http://upload.wikimedia.org/wikipedia/commons/2/26/的-bw.png

so that开发者_高级运维 when you save the image, it's saved as 的-bw.png

Simple enough eh? Now how to get php to do that? Just guessing, I tried utf8_decode($fileName) .. but getting the wrong Chinese characters.

$src= "http://upload.wikimedia.org/wikipedia/commons/2/26/%E7%9A%84-bw.png";  
$pngData = file_get_contents($src);  
$fileName = basename($src);  
file_put_contents($fileName, $pngData);

Any help appreciated, as I really have no idea where to go from here.


Have you tried url_decode(); ?

<?php
$url = 'http://upload.wikimedia.org/wikipedia/commons/2/26/%E7%9A%84-bw.png';
$parts = explode('/', $url);
$title = $parts[count($parts)-1]; //get last section

$title = urldecode($title);
?>


Squirrelmail contains a nice function in the sources to convert unicode to entities:

<?php 
function charset_decode_utf_8 ($string) { 
       /* Only do the slow convert if there are 8-bit characters */ 
     /* avoid using 0xA0 (\240) in ereg ranges. RH73 does not like that */ 
     if (! ereg("[\200-\237]", $string) and ! ereg("[\241-\377]", $string)) 
         return $string; 

     // decode three byte unicode characters 
     $string = preg_replace("/([\340-\357])([\200-\277])([\200-\277])/e",        
     "'&#'.((ord('\\1')-224)*4096 + (ord('\\2')-128)*64 + (ord('\\3')-128)).';'",    
     $string); 

     // decode two byte unicode characters 
     $string = preg_replace("/([\300-\337])([\200-\277])/e", 
     "'&#'.((ord('\\1')-192)*64+(ord('\\2')-128)).';'", 
     $string); 

     return $string; 
 } 
?>
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜