Fastest way to insert into a SQL Server table from .NET code?

2023-02-05 23:58 问答作者：

What is the fastest way to do this:

One table, no references that I cannot prefill (i.e. there is one reference key there, but i have all the data filled in)
LOTS of data. We talk of hundreds of millions of rows per day, coming in dynamically through an API
Requests must / should be processed as soon as feasible in a near real time scenario (i.e. no writing out to a file for upload one per day). 2 seconds is the normal maximal delay
Separate machines for data / application and the SQL Server

What I do now:

Aggregate up to 32*1024 rows into an array, then queue it.
Read the queue in 2-3 threads. Insert into database using SqlBulkCopy.

I get about 60k-75k rows imported per second, which is not enough, but quite close. I would love to hit 250.000 rows.

So far nothing is really used. I get 20% time "network I/O" blocks, have one core 80% loaded CPU side. Discs are writing out 7mb-14mb, mostly idle. Average queue length on a RAID 10 of 6 raptors is.... 0.25.

Anyone any idea how to speed this up? Faster server (so far it is virtual, 8gb ram, 4 cores, physical disc pass through for data).

Adding some clarifications:

T开发者_JAVA百科his is a 2008 R2 Enterprise SQL Server on a 2008 R2 server. machine has 4 cores, 8gb ram. All 64 bit. The 80% load average comes from this machine showing about 20% cpu load.
The table is simple, has no primary key, only an index on a relational reference (instrument reference) and a unique (within a set of instruments, so this is not enforced) timestamp.
The fields on the table are: timestamp, instrument reference (no enforced foreign key), data type (char 1, one of a number of characters indicating what data is posted), price (double) and volume (int). As you can see this is a VERY thin table. The data in question is tick data for financial instruments.
The question is also about hardware etc. - mostly because i see no real bottleneck. I am inserting in multiple transactions and it gives me a benefit, but a small one. Discs, CPU are not showing significant load, network io wait is high (300ms/second, 30% at the moment), but this is on the same virtualization platform which runs JSUT the two servers and has enough cores to run all. I pretty much am open to "buy another server", but i want to identify the bottleneck first.... especially given that at the end of the day I am not grabbing what the bottleneck is. Logging is irrelevant - the bulk inserts do NOT go into the data log as data (no clustered index).

Would vertical partitioning help, for example by a byte (tinyint) that would split the instrument universe by for example 16 tables, and me thus doing up to 16 inserts at the same time? As actually the data comes from different exchanges I could make a partition per exchange. This would be a natural split field (which is actually in instrument, but I could duplicate this data here).

Some more clarifications: Got the speed even higher (90k), now clearly limited by network IO between machines, which could be VM switching.

What I do now is do a connection per 32k rows, put up a temp table, insert into this with SqlBUlkdCopy, THEN use ONE sql statement to copy to main table - minimizes any lock times on the main table.

Most waiting time is now still on network IO. Seems I run into issues where VM wise. Will move to physical hardware in the next months ;)

If you manage 70k rows per second, you're very lucky so far. But I suspect it's because you have a very simple schema.

I can't believe you ask about this kind of load on

virtual server
single array
SATA disks

The network and CPUs are shared, IO is restricted: you can't use all resources. Any load stats you see are not very useful. I suspect the network load you see is traffic between the 2 virtual servers and you'll become IO bound if you resolve this

Before I go on, read this 10 lessons from 35K tps. He wasn't using a virtual box.

Here is what I'd do, assuming no SAN and no DR capability if you want to ramp up volumes.

Buy 2 big phyical servers, CPU RAM kind of irreleveant, max RAM, go x64 install
Disks + controllers = fastest spindles, fastest SCSI. Or a stonking great NAS
1000MB + NICs
RAID 10 with 6-10 disk for one log file for your database only
Remaining disk RAID 5 or RAID 10 for data file

For reference, our peak load is 12 million rows per hour (16 core, 16GB, SAN, x64) but we have complexity in the load. We are not at capacity.

From the answers I read here, it seems you really have a hardware problem rather than a code problem. Ideally, you'll get your performance boosts by making available more disk I/O or network bandwidth, or by running the program on the same virtual machine that hosts the database.

However I do want to share the idea that table parameter inserts are really ideal for big data transfer; although SqlBulkCopy appears to be just as fast, it's significantly less flexible.

I wrote an article about the topic here: http://www.altdevblogaday.com/2012/05/16/sql-server-high-performance-inserts/

The overall answer is that you roughly want to create a table type:

CREATE TYPE item_drop_bulk_table_rev4 AS TABLE (
    item_id BIGINT,
    monster_class_id INT,
    zone_id INT,
    xpos REAL,
    ypos REAL,
    kill_time datetime
)

Then, you create a stored procedure to copy from the table parameter into the actual table directly, so there are fewer in-between steps:

CREATE PROCEDURE insert_item_drops_rev4
    @mytable item_drop_bulk_table_rev4 READONLY
AS

INSERT INTO item_drops_rev4 
    (item_id, monster_class_id, zone_id, xpos, ypos, kill_time)
SELECT 
    item_id, monster_class_id, zone_id, xpos, ypos, kill_time 
FROM 
    @mytable

The SQL Server code behind looks like this:

DataTable dt = new DataTable();
dt.Columns.Add(new DataColumn("item_id", typeof(Int64)));
dt.Columns.Add(new DataColumn("monster_class_id", typeof(int)));
dt.Columns.Add(new DataColumn("zone_id", typeof(int)));
dt.Columns.Add(new DataColumn("xpos", typeof(float)));
dt.Columns.Add(new DataColumn("ypos", typeof(float)));
dt.Columns.Add(new DataColumn("timestamp", typeof(DateTime)));

for (int i = 0; i < MY_INSERT_SIZE; i++) {
    dt.Rows.Add(new object[] { item_id, monster_class_id, zone_id, xpos, ypos, DateTime.Now });
}

// Now we&#039;re going to do all the work with one connection!
using (SqlConnection conn = new SqlConnection(my_connection_string)) {
    conn.Open();
    using (SqlCommand cmd = new SqlCommand("insert_item_drops_rev4", conn)) {
        cmd.CommandType = CommandType.StoredProcedure;

        // Adding a "structured" parameter allows you to insert tons of data with low overhead
        SqlParameter param = new SqlParameter("@mytable", SqlDbType.Structured);
        param.Value = dt;
        cmd.Parameters.Add(param);
        cmd.ExecuteNonQuery();
    }
}

Are there any indexes on the table that you could do without? EDIT: asking while you were typing.

Is it possible to turn the price into an integer, and then divide by 1000 or whatever on queries?

Have you tried adding a pk to the table? Does that improve the speed?

There is also a set-based way to use tally tables to import csv data from http://www.sqlservercentral.com/articles/T-SQL/62867/ (near bottom, requires free registration but worth it).

You might like to try that and test its performance ... with a small tally properly indexed tally table.

It is all slow.

Some time ago we solved a similar problem (insert into DB tens of thousands price data, as I remember it was about 50K per time frame, and we had about 8 time frames that all clashed at :00, so it was about 400K records) and it worked very-very fast for us (MS SQL 2005). Imagine how it will work today (SQL 2012):

<...init...>
if(bcp_init(m_hdbc, TableName, NULL, NULL, DB_IN) == FAIL)
    return FALSE;

int col_number = 1;

// Bind columns
if(bcp_bind(m_hdbc, (BYTE *)&m_sd.SymbolName, 0, 16, (LPCBYTE)"", 1, 0, col_number++) == FAIL) return FALSE;
if(bcp_bind(m_hdbc, (BYTE *)&m_sd.Time, 0, 4, 0, 0, 0, col_number++) == FAIL) return FALSE;
if(bcp_bind(m_hdbc, (BYTE *)&m_sd.Open, 0, 8, 0, 0, 0, col_number++) == FAIL) return FALSE;
if(bcp_bind(m_hdbc, (BYTE *)&m_sd.High, 0, 8, 0, 0, 0, col_number++) == FAIL) return FALSE;
if(bcp_bind(m_hdbc, (BYTE *)&m_sd.Low, 0, 8, 0, 0, 0, col_number++) == FAIL) return FALSE;
if(bcp_bind(m_hdbc, (BYTE *)&m_sd.Close, 0, 8, 0, 0, 0, col_number++) == FAIL) return FALSE;
if(bcp_bind(m_hdbc, (BYTE *)&m_sd.Volume, 0, 8, 0, 0, 0, col_number++) == FAIL) return FALSE;


<...save into sql...>
BOOL CSymbolStorage::Copy(SQL_SYMBOL_DATA *sd)
{
    if(!m_bUseDB)
        return TRUE;

    memcpy(&m_sd, sd, sizeof(SQL_SYMBOL_DATA));

    if(bcp_sendrow(m_hdbc) != SUCCEED)
        return FALSE;

    return TRUE;
}

Could you use horizontal partitioning? See: http://msdn.microsoft.com/en-us/library/ms178148.aspx & http://msdn.microsoft.com/en-us/library/ms188706.aspx

You might also want to look at this question, and possibly change the recovery model: Sql Server 2008 Tuning with large transactions (700k+ rows/transaction)

Some questions: What edition of SQL Server are you using?

Why is the one core at 80%? That might be the bottleneck, so is probably something worth investigating.

What OS are you using, and is it 64 bit?

继续阅读：sql-server-2008 sqlbulkcopy

Fastest way to insert into a SQL Server table from .NET code?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

Easiest way to get words of one line from istream into a vector?

Infinite gtk warnings when I right click on the icon

Best solution for private video database [closed]

国内夏季避暑旅游胜地有哪些？