Optimizing MySQL inserts to handle a data stream

2022-12-11 21:53 问答作者：

I am consuming a high rate data stream and doing the following steps to store data in a MySQL database. For each new arriving item.

(1) Parse incoming item.
(2) Execute several "INSERT ... ON DUPLICATE KEY UPDATE"

I have used INSERT ... ON DUPLICATE KEY UPDATE to eliminate one additional round-trip to the database.

While tr开发者_如何学JAVAying to improve the overall performance, I have considered doing bulk updates in the following way:

(1) Parse incoming item.
(2) Generate SQL statement with "INSERT ... ON DUPLICATE KEY UPDATE" and append to a file.

Periodically flush the SQL statements in the file to the database.

Two questions:

(1) will this have a positive impact in the database load?
(2) how should I flush the statements to the database so that indices are only reconstructed after the complete flush? (using transactions?)

UPDATE: I am using Perl DBI + MySQL MyISAM.

Thanks in advance for any comments.

If your data does not need to go into the database immediately you can cache your insert data somewhere, then issue one larger insert statement, e.g.

insert into table_name (x, y, z) values (x1, y1, z1), (x2, y2, z2), ... (xN, yN, zN) on duplicate update ...;

To be clear, I would maintain a list of pending inserts. In this case a list of (x,z,y) triplets. Then once your list exceeds some threshold (N) you generate the insert statement and issue it.

I have no accurate timing figures for you, but this increased performance roughly 10 times when compared to inserting each row individually.

I also haven't played with the value of N, but I found 1000 to work nicely. I expect the optimal value is affected by hardware and database settings.

Hope this helps (I am also using MyIsam).

You don't say what kind of database access environment (PERL DBI? JDBC? ODBC?) you're running in, or what kind of table storage engine (MyISAM? InnoDB?) you're using.

First of all, you're right to pick INSERT ... ON DUPLICATE KEY UPDATE. Good move, unless you can guarantee unique keys.

Secondly, if your database access environment allows it, you should use prepared statements. You definitely won't get good performance if you write a bunch of statements into a file, and then make a database client read the file once again. Do the INSERT operations directly from the software package that consumes the incoming data stream.

Thirdly, pick the right kind of table storage engine. MyISAM inserts are going to be faster than InnoDB, so if you're logging data and retrieving it later that will be a win. But InnoDB has better transactional integrity. If you're really handling tonnage of data, and you don't need to read it very often, consider the ARCHIVE storage engine.

Finally, consider doing a START TRANSACTION at the beginning of a batch of INSERT ... commands, then doing a COMMIT and another START TRANSACTION after a fixed number of rows, like 100 or so. If you're using InnoDB, this will speed things up a lot. If you're using MyISAM or ARCHIVE, it won't matter.

Your big wins will come from the prepared statement stuff and the best choice of storage engine.

继续阅读：bulkinsert optimization

Optimizing MySQL inserts to handle a data stream

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？