开发者

How to convert text to unicode in Rails?

In my database, I have the following entry

id     |      name      |      info
1          John Smith         Çö ¿¬¼

As you can tell, the info column displays wrong -- it's actually Korea开发者_StackOverflown, though. In Chrome, when I switch the browser encoding from UTF-8 to Korean ('euc-kr', I think), I actually manage to view the text as such:

id     |      name      |      info
1          John Smith        횉철 쩔짭쩌

I then manually copy the text into the info in the database and save, and now I can view it in UTF-8, without switching my browser's encoding.

Awesome. Now I'd like to get that same thing done in Rails, not manually. So starting with the original entry again, I go to the console and type:

require 'iconv'
u = User.find(1)
info = u.info
new_info = Iconv.iconv('euc-kr','UTF-8', info)
u.update_attribute('info', new_info)

However, what I end up with is something resembling \x{A2AF}\x{A8FA}\x{A1C6} \x{A2A5}\x{A8A2} in the database, not 횉철 쩔짭쩌.

I have a very basic understanding of unicode and encoding.

Can someone please explain what's going on here and how to get around that? The desired result is what I achieved manually.

Thanks!


Wow. I'm beating myself over the head now. After hours of trying to resolve this, I finally figured it out myself a few minutes after I posted a question here.

The solution consists of three simple steps:

STEP 1:

I almost had it right. I shouldn't be converting from euc-kr to utf-8, but the other way around, as such:

Iconv.iconv('UTF-8', 'euc-kr', info)

STEP 2:

I might still run into some errors in the text, so to be safe I tell Iconv to ignore any errors:

Iconv.iconv('UTF-8//IGNORE', 'euc-kr', info)

Finally, I actually get REAL KOREAN TEXT, yay! The problem is, when I try to insert it into the database, it's still inserting something along the lines of:

UPDATE `users` SET `info` = '--- \n- \"\\xEC\\xB2\\xA0\\xEC\\xB1\\x8C...' etc...

Even though it turns out I have the right text. So why is that? Onto the last step.

STEP 3:

Turns out the output from Iconv is an array. And so, we merge it with join:

Iconv.iconv('UTF-8//IGNORE', 'euc-kr', info).join

And this actually works!

The final code:

require 'iconv'
u = User.find(1)
info = u.info
new_info = Iconv.iconv('UTF-8//IGNORE','euc-kr', info).join
u.update_attribute('info', new_info)

Hope this helps whomever sees this (and knowing myself, probably future me).


why you use Iconv to convert it? first, if you see the correct style on database, you should make sure the database's charset is utf8 on script side, you just save the Korean value, not use Iconv

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜