Charset problem, MySQL and get_meta_tags()
I'm trying开发者_开发百科 to get HTML meta tags with PHP by using get_meta_tags() function. I'm using UTF8 for tables, charset/collations, as connection charset to MySQL and everything else.
But unfortunetely MySQL cuts off the string when inserting to table. It happens while HTML encodings are different than UTF-8 (for example ISO 8859-1)
Is there any way for converting strings to UTF8 without knowing it's encoding charset?
Example:
<?php
// ------------------------------------------------------------
header('Content-Type:text/html; charset=utf-8');
// ------------------------------------------------------------
function str_to_utf8($string) {
if (mb_detect_encoding($string, 'UTF-8', true) === false) {
$string = utf8_encode($string);
}
return $string;
}
// ------------------------------------------------------------
$url = 'http://example.org'; // ---- The URL to get Meta-Tags from ---
// ------------------------------------------------------------
$meta_raw = get_meta_tags($surl);
$meta_enc = array();
foreach($meta_raw as $mkey => $mval) {
$meta_enc[$mkey] = str_to_utf8($mval);
}
// ------------------------------------------------------------
print "<p>the (old) raw data</p>\n";
print "<pre style=\"margin:6px; padding:6px; background:#FFFFCC; text-align:left;\">\n";
print_r($meta_raw);
print "</pre>\n";
print "<br />\n";
print "<br />\n";
// ------------------------------------------------------------
print "<p>the (new) utf8 encoded data</p>\n";
print "<pre style=\"margin:6px; padding:6px; background:#DEDEDE; text-align:left;\">\n";
print_r($meta_enc);
print "</pre>\n";
print "<br />\n";
print "<br />\n";
// ------------------------------------------------------------
?>
:)
in the function: str_to_utf8($string) { ... } you can also use differet ways to dedect and encode the $string like iconv(), mb_convert_encoding(), ...
Encodes an ISO-8859-1 string to UTF-8 (PHP 3 >= 3.0.6, PHP 4, PHP 5)
string utf8_encode ( string data )
Convert string to requested character encoding (PHP 4 >= 4.0.5, PHP 5)
string iconv ( string in_charset, string out_charset, string str )
However, if you want to change to UTF-8 regardless of encoding, checkout;
Convert character encoding (PHP 4 >= 4.0.6, PHP 5)
string mb_convert_encoding ( string str, string to_encoding [, mixed from_encoding] )
精彩评论