开发者

Testing algorithm against large dataset

I'm implementing a statistical algorithm that needs access to a large sample dataset for proper testing. Large being 50,000 rows in a single table, MySQL.

I would like to use traditional RSpec methods to test, but creating the sample set and loading it into the DB leads to two problems.

开发者_JAVA技巧
  • Extremely slow/intensive using Active Record create. I haven't explored various options to create to skip validation, since the models are pretty basic and I assume it won't make a huge speed difference
  • Improper cleanup using a hacky mysqlimport (meaning data left in the database after test, despite an explicit call to DatabaseCleaner in an :after block)

Creating the object graph in-memory is a possibility, but not being a mockist I'm a little afraid to override AR functionality.

Any ideas, best practices?

Thanks! Justin


It's only a partial answer, but:

  • Extremely slow/intensive using Active Record create. (...) I assume it won't make a huge speed difference

It actually is a big speed difference. PostgreSQL has a good guide on this:

http://www.postgresql.org/docs/9.0/interactive/populate.html

Most it applies to MySQL directly:

  • Use a single transaction, rather than many of them.
  • Load data in file: http://dev.mysql.com/doc/refman/5.5/en/load-data.html
  • Remove indexes and recreate them after the inserts.
  • Disable fkey constraints while loading your data (assumes your data is clean, of course).
  • Give MySQL plenty of resources.
  • Disable replication if applicable.
  • Improper cleanup using a hacky mysqlimport (meaning data left in the database after test, despite an explicit call to DatabaseCleaner in an :after block)

If you want to flush your tables of all their data, try truncate:

http://dev.mysql.com/doc/refman/5.5/en/truncate-table.html

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜