开发者

What Are the Performance Differences Between Running One vs Many Inserts

I'm currently in a situation where I'm building a script that I know will need to insert multiple rows. I'm doing this in Perl, so in terms of parameterization, it's much easier to insert each row individua开发者_如何学Clly. In terms of speed, I'm guessing running just one insert statement will be faster (although latency will be relatively low as I'm quite close to the database itself). I'm thinking the number of rows per run of the script will be about 20-40 on average. That said, what would be the approximate performance differences between running just 1 INSERT INTO statement v.s. running one for each row? Note: The server is running SQL 2008.

[EDIT]Since there seems to be a lot of confusion, I'd like to clarify that what I'm really asking for is the theory behind how a multi-row insert is handled by SQL Server 2008. Does it essentially just convert it internally into a bunch of individual insert statements and run those over one connection, or does it do something more intelligent?

Yes, I know I can run timed loops. No, that's not what I'm asking for. [/EDIT]


Combining multiple inserts into one command is always going to execute much more quickly than executing separate inserts. The reasons are:

  • A lot of work is done parsing the SQL - with multi version, there's only one parsing effort
  • More work is done checking permissions - again, only done once
  • Database connections are "chatty" - with multi version, handshaking only done once. You really notice this issue when using a poor network connection
  • Finally, multi version gives opportunity for server to optimize the operation


There is a general idea to let the SQL database do its thing and not try to treat the database as some sort of disk read. I've seen many times where a developer will read from one table, then another, or do a general query and then run through each row to see if it's the one they want. Generally, it's better to let the SQL database do its thing.

In this case, I can't really see an advantage of doing a single vs. multiple row insert. I guess there might be some because you don't have to do multiple prepares, and commits.

It shouldn't be too difficult to actual create a temporary database and try this out. Create a database with two columns, and have the program generate data to toss into the tables. Give yourself a decent amount to do. For example, how many items will this table have? And, how many do you think you'll be inserting at once? Say create a table of 1,000,000 items, and insert into this table 1000 items at a time, 100 items at a time, and one item at a time. Just generate data using the increment operator. There may be a "sweetspot" of the number of items you can insert at once.

In my unbiased, and always correct opinion, you'll probably find that the difference isn't worth fretting over, and you should instead employ the method that makes your code the easiest to maintain.

I've have a programming dictum: The place where you want to optimize your code is probably the wrong place. We like efficiency, but we usually attack the wrong item. And, whatever we've squeezed out in terms of efficiency, we end up wasting in maintenance.

So, just program what is the easiest to understand and don't fret about being overly efficient.


Just to add a couple of other performance differentiators to think about on insertion:

  • Foreign Keys - If the table you are inserting into has foreign keys, SQL Server effectively needs to join to the foreign key tables on insert. When you do your inserts in one query, SQL server can be more efficient in doing these joins.

  • Transactions - As you don't mention transactions, I assume you must be using SQL Server auto-commit mode. With such a small number of rows, it is likely that the overhead of creating 40 transactions vs. 1 transaction would be higher than maintaining the log to allow rollback. However, if you were inserting 400000 rows, it would likely be more expensive to insert in one statement/transaction than insert 400000 separate rows as the cost to be prepared to roll back up to 400000 rows is very high (if you were to insert 400000 rows, it usually is best to insert in batches -> the optimal batch size can be determined through testing). Also, above a certain row count, it may become more efficient to disable the foreign keys, insert the rows, then re-enable them.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜