开发者

Encoding MySQL text fields into UTF-8 text files - problems with special characters

I'm writing a php script to export MySQL database rows into a .txt file formatted for Adobe InDesign's internal markup.

Exports work, but when I encounter special characters like é or umlauts, I get weird symbols (eg Chloë Hanslip instead of Chloë Hanslip). Rather than run a search and replace for every possible weird character, I need a better method.

I've checked that when the text hits the database, it's saved properly - in the database I see the special characters. My export code basically runs some regular expressions to put in the InDesign code tags, and I'm left with the weird symbols. If I just output the text to the browser (rather than prompt for a text file download), it displays properly. When I save the file I use this code:

header("Content-disposition: attachment; filename=test.txt");

header("Content-Type: text/p开发者_高级运维lain; charset=utf-8");

I've tried various combinations of utf8_encode() and iconv() to no avail. Can anybody point me in the right direction here?


InDesign wouldn't be able to use any encoding specified in the header. (It wouldn't even see it, as it's not kept when you save to disc in Windows.) Instead you have to explicitly tell it the encoding in a special tag of its own at the start of the file, such as:

<ANSI-WIN>

Unfortunately, it does not use standard encoding names and there is no tag that InDesign understands that corresponds to UTF-8 encoding at all. The only encoding tag you can use that will allow you to include any character you like is:

<UNICODE-WIN>

which corresponds to UTF-16 (little-endian with BOM), with Windows CRLF line endings. (The only other line ending option is MAC, which you don't want at all as it's old-school pre-OSX Macs where the line ending character was CR.)

So, given a UTF-8 string $s including UTF-8 byte sequences you've pulled out of the database and plain (Unix-Linux-OSX-web-style) LF newlines, you'd write it like this:

$s= "<UNICODE-WIN>\r\n".str_replace("\n", "\r\n", $s);
echo iconv('UTF-8', 'UTF-16', $s);

(Ensuring not to output any whitespace before or after, because it'll break the UTF-16 encoding.


Before export you can use SET NAMES command for change the encoding of transmission eg:

SET NAMES utf8;

You can configure this in your mysql backuper software.


just call in PHP after DB connection methods mysql_set_charset('utf8');


Looks like an ISO-8859-1 string is sent as UTF-8...

Make sure your table and fields are in UTF-8 and connect to the database in UTF-8 too. If your table and fields are in UTF-8 and you don't specify the MySQL charset, MySQL will convert on the fly data to ISO-8859-1 (latin1) - thats the default configuration for all the hosts I've used so far...

This is the way I use to do this (back compatible with PHP 5.2.2 and less):

$conn = mysql_connect('localhost', 'user', 'pass');
mysql_select_db('dbname');
if (mysql_errno())
{
    //Handle database connection error here
}

if (function_exists('mysql_set_charset'))
    mysql_set_charset('utf8', $conn); //PHP 5.2.3+ only
else
{
    if (mysql_query("SET character_set_results = 'utf8', character_set_client = 'utf8', character_set_connection = 'utf8', character_set_database = 'utf8', character_set_server = 'utf8'", $conn) === false)
    {
        //Unable to set database charset! Handle error here...
    }
}


then converting to UTF-16 - this resulted in a file that my text editor displayed solely as squares,

iconv may not add BOM bytes \xff\xfe that have to be placed in the beginning of the Unicode file.

Try this one: $out = "\xff\xfe" . iconv('UTF-8','UTF-16LE',$out);

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜