Testing algorithm against large dataset
I'm implementing a statistical algorithm that needs access to a large sample dataset for proper testing. Large being 50,000 rows in a single table, MySQL.
I would like to use traditional RSpec methods to test, but creating the sample set and loading it into the DB leads to two problems.
开发者_JAVA技巧- Extremely slow/intensive using Active Record create. I haven't explored various options to create to skip validation, since the models are pretty basic and I assume it won't make a huge speed difference
- Improper cleanup using a hacky
mysqlimport
(meaning data left in the database after test, despite an explicit call to DatabaseCleaner in an :after block)
Creating the object graph in-memory is a possibility, but not being a mockist I'm a little afraid to override AR functionality.
Any ideas, best practices?
Thanks! Justin
It's only a partial answer, but:
- Extremely slow/intensive using Active Record create. (...) I assume it won't make a huge speed difference
It actually is a big speed difference. PostgreSQL has a good guide on this:
http://www.postgresql.org/docs/9.0/interactive/populate.html
Most it applies to MySQL directly:
- Use a single transaction, rather than many of them.
- Load data in file: http://dev.mysql.com/doc/refman/5.5/en/load-data.html
- Remove indexes and recreate them after the inserts.
- Disable fkey constraints while loading your data (assumes your data is clean, of course).
- Give MySQL plenty of resources.
- Disable replication if applicable.
- Improper cleanup using a hacky mysqlimport (meaning data left in the database after test, despite an explicit call to DatabaseCleaner in an :after block)
If you want to flush your tables of all their data, try truncate:
http://dev.mysql.com/doc/refman/5.5/en/truncate-table.html
精彩评论