Process 46,000 rows of a document in groups of 1000 using C# and Linq

2023-02-17 00:03 问答作者：

I have this code below that executes. IT has 46,000 records in the text file that i need to process and insert into the database. It takes FOREVER if i just call it directly and loop one at a time.

I was trying to use开发者_如何学运维 LINQ to pull every 1000 rows or so and throw it into a thread so I could proces 3000 rows at once and cut the processing time. I can't figure it out though. so I need some help.

Any suggestions would be welcome. Thank You in advance.

var reader = ReadAsLines(tbxExtended.Text);
        var ds = new DataSet();
        var dt = new DataTable();

        string headerNames = "Long|list|of|strings|"                                  
        var headers = headerNames.Split('|');
        foreach (var header in headers)
            dt.Columns.Add(header);

        var records = reader.Skip(1);
        foreach (var record in records)
            dt.Rows.Add(record.Split('|'));

        ds.Tables.Add(dt);
        ds.AcceptChanges();

        ProcessSmallList(ds);

If you are looking for high performance then look at the SqlBulkInsert if you are using SqlServer. The performance is significantly better than Insert row by row.

Here is an example using a custom CSVDataReader that I used for a project, but any IDataReader compatible Reader, DataRow[] or DataTable can be used as a parameter into WriteToServer, SQLDataReader, OLEDBDataReader etc.

Dim sr As CSVDataReader
Dim sbc As SqlClient.SqlBulkCopy
sbc = New SqlClient.SqlBulkCopy(mConnectionString, SqlClient.SqlBulkCopyOptions.TableLock Or SqlClient.SqlBulkCopyOptions.KeepIdentity)
sbc.DestinationTableName = "newTable"
'sbc.BulkCopyTimeout = 0

sr = New CSVDataReader(parentfileName, theBase64Map, ","c)
sbc.WriteToServer(sr)
sr.Close()

There are quite a number of options available. (See the link in the item)

To bulk insert data into a database, you probably should be using that database engine's bulk-insert utility (e.g. bcp in SQL Server). You might want to first do the processing, write out the processed data into a separate text file, then bulk-insert into your database of concern.

If you really want to do the processing on-line and insert on-line, memory is also a (small) factor, for example:

ReadAllLines reads the whole text file into memory, creating 46,000 strings. That would occupying a sizable chunk of memory. Try to use ReadLines instead which returns an IEnumerable and return strings one line at a time.
Your dataset may contain all 46,000 rows in the end, which will be slow in detecting changed rows. Try to Clear() the dataset table right after insert.

I believe the slowness you observed actually came from the dataset. Datasets issue one INSERT statement per new record, which means that you won't be saving anything by doing Update() 1,000 rows at a time or one row at a time. You still have 46,000 INSERT statements going to the database, which makes it slow.

In order to improve performance, I'm afraid LINQ can't help you here, since the bottleneck is with the 46,000 INSERT statements. You should:

Forgo the use of datasets
Dynamically create an INSERT statement in a string
Batch the update, say, 100-200 rows per command
Dynamically build the INSERT statement with multiple VALUE statments
Run the SQL command to insert 100-200 rows per batch

If you insist on using datasets, you don't have to do it with LINQ -- LINQ solves a different type of problems. Do something like:

// code to create dataset "ds" and datatable "dt" omitted
// code to create data adaptor omitted

int count = 0;

foreach (string line in File.ReadLines(filename)) {
    // Do processing based on line, perhaps split it
    dt.AddRow(...);
    count++;

    if (count >= 1000) {
        adaptor.Update(dt);
        dt.Clear();
        count = 0;
    }
}

This will improve performance somewhat, but you're never going to approach the performance you obtain by using dedicated bulk-insert utilities (or function calls) for your database engine.

Unfortunately, using those bulk-insert facilities will make your code less portable to another database engine. This is the trade-off you'll need to make.

继续阅读：linq

Process 46,000 rows of a document in groups of 1000 using C# and Linq

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？