Java Large database inserts
I have a开发者_开发知识库 database in which I need to insert batches of data (around 500k records at a time). I was testing with derby and was seeing insert times of about 10-15minutes for this many records (I was doing a batch insert in Java).
Does this time seem slow (working on your average laptop)? And are there approaches to speeding it up?
thanks,
Jeff
This time seems perfectly reasonable, and is in agreement with times I have observed. If you want it to go faster, you need use bulk insert options and disable safety features:
- Use PreparedStatements and batches of 5,000 to 10,000 records unless it MUST be one transaction
- Use bulk loading options in the DBMS
- Disable integrity checks temporarily for insert
- Disable indexes temporarily or delete indexes and re-create them post-insert
- Disable transaction logging and re-enable afterward.
EDIT: Database transactions are limited by disk I/O, and on laptops and most hard drives, the important number is seek time for the disk.
Laptops tend to have rather slow disks, at 5400 rpm. At this speed, seek time is about 5 ms. If we assume one seek per record (an over-estimate in most cases), it would take 40 minutes (500000 * 5 ms) to insert all rows. Now, the use of caching mechanisms and sequencing mechanisms reduces this somewhat, but you can see where the problem comes from.
I am (of course) vastly oversimplifying the problem, but you can see where I'm going with this; it's unreasonable to expect databases to perform at the same speed as sequential bulk I/O. You've got to apply some sort of indexing to your record, and that takes time.
精彩评论