PHP encoding from ISO-8859-1 to UTF-8
<?php
mb_internal_encoding('UTF-8');
mb_language('uni');
$a=file_get_contents("http://www.ciao.de/Erfahrungsberichte/8x4_Wild_Flower_Deo_Spray__8937431");
preg_match('/dass auf dem Versch(.*)ziehen mich/Us',$a,$b);
$b=$b[1];
echo $b."\n";
echo utf8_encode($b)."\n";
echo mb_convert_encoding($b,'UTF-8','iso-8859-1')."\n";
results in
lussdeckel riesengro▒ und un▒bersehbar glitzernd ein ▒New▒ prangt. Neue Produkte
lussdeckel riesengroß und unübersehbar glitzernd ein �New� prangt. Neue Produkte
luss开发者_开发知识库deckel riesengroß und unübersehbar glitzernd ein �New� prangt. Neue Produkte
HTTP source code suggests in meta tag to use "iso-8859-1". German umlauts are fine, but why are the quotes around "New" not converted correctly? In PHP manual there is a function fix_latin. When using this function the quotes are also converted correctly!?
PS: same occurs with european currency symbol € (EUR) - it's also converted wrong (except with the fix_latin function), but why?
Euro sign is not in ISO-8859-1. (ISO-8859-15 was created for that purpose.)
Best I recollect, mb_convert_encoding()
will not transliterate characters. Consider using iconv()
instead. And/or be sure to set the content-type
header as needed.
In the next PHP version there will also be the Transliterator class, which wraps ICU.
精彩评论