Multithreaded application to update database
I have a C# applica开发者_如何学Gotion that inserts rows into three separate tables in a SQL Server database. This is a massive batch job (2 - 3M+ rows each). My code looks something like this (I've edited to take out unnecessary detail):
string sqlCust = "INSERT INTO customer (account, name, last_order) VALUES (@account, @name, @last_order)";
string sqlOrder = "INSERT INTO orders (num, order_date) VALUES (@num, @order_date)"
string sqlOrderLines = "INSERT INTO order_lines (product) VALUES (@prod)"
db.Open();
while (GetNextCust())
{
using (SqlCommand cmdIns = new SqlCommand(sqlCust, db.Connection))
{
cmdIns.Parameters.Add("@account", custAcc);
cmdIns.Parameters.Add("@name", custName);
cmdIns.Parameters.Add("@last_order", lastOrder);
cmdIns.ExecuteNonQuery();
}
while (GetNextOrder(custAcc))
{
...
while (GetNextOrderLine(orderNum)
{
...
}
}
}
The process is off-line, and I want to queue up as much work with the database as possible to increase the throughput. My question is, is there an optimal number of threads (or is there a way to discover what this might be - other than trial and error)? Also, are there an major caveats to be wary of when doing something like this with threads?
You need to experiment. If you are reading and writting to a single source, then the optimal number of threads is probably one. If you're reading from multiple sources, and writting to a single source, then 2 or 3 may gain some improvements.
In your case above, the most significant wins would be to switch from transactional inserts to SqkBulkCopy
.
Unless you're doing a ton of processing I'm guessing your bottleneck will be the disk itself (the database). Ergo its likely the optimal number of threads will be one.
Of course you'll have to deal with people wishing to exit the application while this might be running (if it is an app) so you'll need some sort of exit check to shut down in a timely fashion.
Muli-threaded applications can only process faster on multi-core machines.
If the database is the bottle neck, and it probably is, adding threads will slow down the process since in addition to the overhead of task-switching between the threads, the database will be spending more time queuing and managing the work from multiple requests.
Well, I think you can try some kind of connection pool, that way for each new user (that calls for some db transactions) you will get a new thread (I call them DBBrockers), which will enable him database access. For all this to work, you will need a multi-core machine; more processors, more threads gets through.
Unfortunately, trial and error is your best option. It is hard to predict an exact optimal design beforehand as there are many factors to consider here. Where does the data for your UPDATEs come from? If they come from a shared resource, then multithreading might not help so much. Also, the design of the table(s) comes into play. SQL Server is a complex database, this batch update isn't necessarily I/O bound. Network communication can also come into play here, as does SQL Server configuration.
For an optimal number of threads, again trial and error here. I'd start with two, then try bumping that number up, even beyond the number of cores you have. The reason for that is because you probably have a network between your client and server. Also, each thread should keep its own database connection.
As an alternative to client-side processing, you can upload the entire batch job input file (or whatever you have) to the server, perhaps using WCF. Then you can use better mechanisms to perform the batch update, instead of individual SQL commands.
Always "test and measure".
精彩评论