How to convert text to unicode in Rails?
In my database, I have the following entry
id | name | info
1 John Smith Çö ¿¬¼
As you can tell, the info column displays wrong -- it's actually Korea开发者_StackOverflown, though. In Chrome, when I switch the browser encoding from UTF-8 to Korean ('euc-kr', I think), I actually manage to view the text as such:
id | name | info
1 John Smith 횉철 쩔짭쩌
I then manually copy the text into the info in the database and save, and now I can view it in UTF-8, without switching my browser's encoding.
Awesome. Now I'd like to get that same thing done in Rails, not manually. So starting with the original entry again, I go to the console and type:
require 'iconv'
u = User.find(1)
info = u.info
new_info = Iconv.iconv('euc-kr','UTF-8', info)
u.update_attribute('info', new_info)
However, what I end up with is something resembling \x{A2AF}\x{A8FA}\x{A1C6} \x{A2A5}\x{A8A2}
in the database, not 횉철 쩔짭쩌
.
I have a very basic understanding of unicode and encoding.
Can someone please explain what's going on here and how to get around that? The desired result is what I achieved manually.
Thanks!
Wow. I'm beating myself over the head now. After hours of trying to resolve this, I finally figured it out myself a few minutes after I posted a question here.
The solution consists of three simple steps:
STEP 1:
I almost had it right. I shouldn't be converting from euc-kr to utf-8, but the other way around, as such:
Iconv.iconv('UTF-8', 'euc-kr', info)
STEP 2:
I might still run into some errors in the text, so to be safe I tell Iconv to ignore any errors:
Iconv.iconv('UTF-8//IGNORE', 'euc-kr', info)
Finally, I actually get REAL KOREAN TEXT, yay! The problem is, when I try to insert it into the database, it's still inserting something along the lines of:
UPDATE `users` SET `info` = '--- \n- \"\\xEC\\xB2\\xA0\\xEC\\xB1\\x8C...' etc...
Even though it turns out I have the right text. So why is that? Onto the last step.
STEP 3:
Turns out the output from Iconv is an array. And so, we merge it with join
:
Iconv.iconv('UTF-8//IGNORE', 'euc-kr', info).join
And this actually works!
The final code:
require 'iconv'
u = User.find(1)
info = u.info
new_info = Iconv.iconv('UTF-8//IGNORE','euc-kr', info).join
u.update_attribute('info', new_info)
Hope this helps whomever sees this (and knowing myself, probably future me).
why you use Iconv to convert it? first, if you see the correct style on database, you should make sure the database's charset is utf8 on script side, you just save the Korean value, not use Iconv
精彩评论