Doing huge data migrations in rails
I am experiencing big time and computing power challenges when doing big data migration (several 100.000 rows). I am developing a service that handles a lot of data in rails. Our models are constantly changing as we get more and more cleaver about our design. This leads to a lot of migrations on our database which is a Postgres 9.0 database. Often these migrations also includes some kind of migration on the data itself. Yesterday we found out that we needed to move a 'text' attribute on a model into a separate model so that the attribute was no longer just an attribut开发者_StackOverflow中文版e on the model but a one to many relationship instead.
My migration looked somewhat like this:
def self.up
create_table :car_descriptions do |t|
t.integer :car_id
t.text :description
t.timestamps
end
Car.find_each do |car|
if car.description.present?
car.descriptions.build :description => car.description
end
car.save
end
remove_column :cars, :description
end
Now the problem is, that this is running pretty slow, and even worse, if I set a counter, and prints out the progress, I can see that the migration is running slower and slower over time. In my activity monitor I can see that the ruby process is taking up more and more memory.
So my question is - is there a better way to do big data migrations like this?
You should not use ActiveRecord here for migrating the data from your Car
model to CarDescription
model. Instead you should retort to running raw sql(run from migration). In my last job, we had such problems with huge data, and running raw sqls resulted in much quicker migrations(although quick was still 5-6 hours sometimes). One other practice we developed over time after many bitter experiences was, we always copied our database from production to our staging server and ran the migration atleast twice on the staging. We always came up with some process(specific to migration) which was a huge time saver following this practice. Sometimes the process included, manually dropping some indexes, running the migration and manually creating those indexes again.
In current case, the sql can look something like this:
INSERT INTO car_descriptions(car_id, description) SELECT id, description FROM cars
Hope you find it useful, let me know if I can add something to the answer.
精彩评论