UTF-8 encoding problem when importing from PHP app to Rails app
I have Rails app (Rails 3.0.1, Ruby 1.9.2) that uses MySQL as its development database. The database is set up to use the "utf8_unicode_ci" collation, as well as all the fields. I also have a legacy PHP app that this Rails app is开发者_开发技巧 set to replace when it's completed. The PHP app uses MySQL as well, but all of the fields are using "latin1_sweedish_ci". I have written Rake tasks that use both the MySQL API and ActiveRecord to import data from the old database to the new one, which seems to go well until it encounters Unicode characters in the source database. When using the "mysql" gem, after running the rake task and trying to load a page with Unicode characters on it, I get the following error:
incompatible character encodings: ASCII-8BIT and UTF-8
Switching to the 'ruby-mysql' gem and restarting the server, however, fixes the problem, and the Unicode characters are displayed properly. However, it only works in this combination, because when I import the data using the 'ruby-mysql' gem, the page renders but all of the Unicode is messed up and replaced with garbage-y characters.
What can I do to fix this, or at least be able to import my data and render it without having to change the source code? I'm running MySQL server 5.1.53 from MacPorts on OS X Snow Leopard. I have compiled both the 'mysql' and 'ruby-mysql' gems as 64-bit, though I do boot OS X with "arch=i386", so that may not have been necessary.
Here is an example of a rake task:
desc "Imports posts from legacy app"
task :posts => :environment do
my = Mysql.connect("localhost", "importer", "*password removed*", "publicweb", nil, "/opt/local/var/run/mysql5/mysqld.sock")
res = my.query("SELECT * FROM updates")
res.each do |row|
post = Post.new
post.title = Legacy.strip_slashes row[1]
post.body = Legacy.resolve_bbcode row[3], true
post.published_at = Time.parse(row[4])
post.author = User.where(:login => row[2]).first
post.old_id = row[0]
post.old_slug = row[5]
post.state = "published"
post.save!
end
puts "Imported #{res.num_rows} posts"
end
I'm pretty sure you should be using the mysql2 gem these days, as it is better supported. This is the default in a new Rails app anyway.
In terms of solving your encoding issue, is there any reason the PHP app could not cope with UTF-8? If not, then the simplest thing to do would be to follow the steps in this article:
http://en.gentoo-wiki.com/wiki/Convert_latin1_to_UTF-8_in_MySQL
I found this with a quick Google search. This might work, but I definitely can't guarantee it so I'd recommend you do a backup first and run some tests.
精彩评论