开发者

How do I encode Japanese into something like "日本に行って"? (UTF-8)

As the question in the title states. I can't seem to find the answer with any of the following: php headers, css headers, html headers, mysql charsets (to utf8_general_ci), or

<form acceptcharset="utf-8"... >

Really stumped on this one.

I'm basically going through this process:

  1. Type Japanese characters, process through a form
  2. Form saves in MySQL DB
  3. PHP pulls data out of MySQL DB, and formats it for a webpage

At step 3, I check the code and see that it's literally displaying the Japanese characters. Because it's doing that, I'm guessing it's causing the PHP errors I'm getting (the functions that work fine for English characters aren't working so fine for the Japanese text).

So I want to encode in UTF-8 format, but I'm not sure how to do this?

Edit: Here's the PHP function I'm using on the Japanese text

function short_text_jap($text, $length=300) { 
    if (strlen($text) > $length) { 
            $pattern = '/^(.{0,'.$length.'}\\b).*$/s'; 
            $text 开发者_开发知识库= preg_replace($pattern, "$1...", $text); 
    } 
    return $text;

But instead of a shortened amount of text, it returns the whole thing.


As you seem to want to convert your UTF-8 encoded string to ASCII and non-ASCII characters to character references, you can use PHP’s multi-byte string functions to do so:

mb_substitute_character('entity');
$str = '日本語';  // UTF-8 encoded string
echo mb_convert_encoding($str, 'US-ASCII', 'UTF-8');

The output is:

&#x65E5;&#x672C;&#x8A9E;


There seems to be a bit of a confusion about what UTF8 is: by stating the goal as getting the "UTF8 version" of literal Japanese characters.

Things like &#26085; are ASCII-compatible HTML entities (basically Unicode references) already represented in some encoding whereas UTF8 is a multibyte encoding scheme that defines how characters are stored on the byte level.

I suggest relying on the literal form since it makes the whole mess with international alphabets easier to manage.

Simply migrate to UTF8 everywhere: in the database, in HTML, in PHP and in file types. Then it would be possible to use the PHP Multibyte String extension which is designed to handle multibyte characters:

mb_internal_encoding("UTF-8");

function short_text_jap($text, $length=300) {
    return mb_strlen($text) > $length ? mb_substr($text, 0, $length) : $text;
}

echo short_text_jap('日本語', 2); // outputs 日本
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜