Ruby: would using Fibers increase my DB insert throughput?
Currently I am using Ruby 1.9.1 and the 'ruby-mysql' gem, which unlike the 'mysql' gem is written in ruby only. This is pretty slow actually, as it seems to insert at a rate of almost 1 per second (SLOOOOOWWWWWW). And I have a lot of inserts to make too, its pretty much what this script does ultamitely. I am using just 1 connection (since I am using just one thread). I am hoping to speed things up by creating a fiber that will
- create a new DB connection
- insert 1-3 records
- close the DB connection
I would imagine launching 20-50 of these wou开发者_Go百科ld greatly increase DB throughput. Am I correct to go along this route? I feel that this is the best option, as opposed to refactoring all of my DB code since the syntax is a quite a bit different for the default mysql driver :(
Why are you closing a connection after a few inserts? Your script should be:
- Open a connection
- Write however many records you need
- Close connection
Transactions can complicate this.
If you are doing non-transactional inserts, absolutely use one connection and write each record one at a time.
If you are using transactions and writing a truly huge data set (ie millions of rows) then your database may need to be tuned so it has sufficient temporary storage to deal with that.
Basically if a single insert is taking a second it suggests to me one or more of:
- You foreign keys aren't indexed so the referential integrity checks on the insert are woefully inefficient; or
- Excessive connection opening and closing.
Now all that being said, having multiple "workers" doing inserts can lead to useful increases in overall throughput but at 1 second per insert, that isn't your problem. You need to find what your problem is.
With truly huge batch inserts it may also be worth disabling check constraints, foreign keys and indices, doing all the inserts and then re-enabling them.
Since you are using ruby1.9.1, just use Threads. They are no longer green. Fibers are not going to help as they are too light weight for multiple, concurrent DB connections.
精彩评论