Slow select when inserting large amounts of data (MYSQL)
I have a process that imports a lot of data (950k rows) using inserts that insert 500 rows at a time. The process generally takes about 12 hours,开发者_如何学C which isn't too bad. Normally doing a query on the table is pretty quick (under 1 second) as I've put (what I think to be) the proper indexes in place. The problem I'm having is trying to run a query when the import process is running. It is making the query take almost 2 minutes! What can I do to make these two things not compete for resources (or whatever)? I've looked into "insert delayed" but not sure I want to change the table to MyISAM.
Thanks for any help!
Have you tried using priority hints?
SELECT HIGH_PRIORITY ...
and INSERT LOW_PRIORITY ...
12 hours to insert 950k rows is pretty heavy duty. How big are these rows? What kind of indexes are on them? Even if the actual data insertion goes quickly, the continual updating of the indexes will definitely cause performance degradation for anything using those table(s) at the time.
Are you doing these imports with the bulk INSERT syntax (insert into tab (x) values (a), (b), (c), etc...) or one INSERT per row? Doing the bulk insert will require a longer index updating period (as it has to generate index data for 500 rows) than doing it for a single row. There will be no doubt be some sort of internal lock placed on the indexes while the data's updated, in which case you're contending with 950k/500 = 1,900 locking sessions at minimum.
I found that on some of my bulk-insert scripts (an http log analyzer for some custom data mining), it was quicker to DISABLE indexes on the relevant tables, then reenabling/rebuilding them after the data dump was completed. If I remember right, it was about 37 minutes to insert 200,000 rows of hit data with keys enabled, and about 3 minutes with no indexing.
So I finally found the slowdown while searching during the import of my data. I had one query like this:
SELECT * FROM `properties` WHERE (state like 'Florida%') and (county like 'Hillsborough%') ORDER BY created_at desc LIMIT 0, 50
and when I ran an EXPLAIN on it, I found out it was scanning around 215,000 rows (even with proper indexes on state and county in place). I then ran an EXPLAIN on the following query:
SELECT * FROM `properties` WHERE (state = 'Florida') and (county = 'Hillsborough') ORDER BY created_at desc LIMIT 0, 50
and saw that it only had to scan 500 rows. Considering that the actual result set was something like 350, I think I identified the slowdown.
I've made the switch to not using "like" in my queries and am very happy with the snappier results.
Thanks to everyone for your help and suggestions. They are much appreciated!
You can try import your data to some auxiliary table and then merge it into the main table. You don't lose performance in your main table, and I think your db can manage the merge much faster than the multiple insertions.
精彩评论