Smartest way to import massive datasets into a Rails application?
I've got multiple massive (multi gigabyte) datasets I need to import into a Rails app. The datasets are currently each in their own database on my development machine, and I need to read from them and create rows in tables in my Rails database based on the开发者_C百科 information they contain. The tables in my Rails database will not be exactly the same as the tables in the source databases.
What's the smartest way to go about this?
I was thinking migrations, but I'm not exactly sure how to connect the migration to the databases, and even if that is possible, is that going to be ridiculously slow?
without seeing the schemas or knowing the logic you want to apply to each row, I would say the fastest way to import this data is to create a view of the table you want to export in the column order you want (and process it using sql) and the do a select into outfile on that view. You can then take the resulting file and import it into the target db.
This will not allow you to use any rails model validations on the imported data, though.
Otherwise, you have to go the slow way and create a model for each source db/table to extract the data (http://programmerassist.com/article/302 tells you how to connect to a different db for a given model) and import it that way. This is going to be quite slow, but you could set up an EC2 monster instance and run it as fast as possible.
Migrations would work for this, but I wouldn't recommend it for something like this.
Since georgian suggested it, I'll post my comment as an answer:
If the changes are superficial (column names changed, columns removed, etc), then I would just manually export them from the old database and into the new, and then run a migration to change the columns.
精彩评论