开发者

MySQL Insert performance degrades on a large table

I'm working with a huge table which has 250+ million rows. The schema is simple.

CREATE TABLE MyTable (
        id BIGINT PRIMARY KEY AUTO_INCREMENT,
        oid INT NOT NULL,
        long1 BIGINT NOT NULL,
        str1 VARCHAR(30) DEFAULT NULL,
        str2 VARCHAR(30) DEFAULT NULL,
        str2 VARCHAR(200) DEFAULT NULL,
        str4 VARCHAR(50) DEFAULT NULL,
        int1 INT(6) DEFAULT NULL,
        str5 VARCHAR(300) DEFAULT NULL,
        date1 DATE DEFAULT NULL,
        date2 DATE DEFAULT NULL,
        lastUpdated TIMESTAMP NOT NULL,
        hashcode INT NOT NULL,
        active TINYINT(1) DEFAULT 1,
        KEY oid(oid),
        KEY lastUpdated(lastUpdated),
        UNIQUE KEY (hashcode, active),
        KEY (active)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 MAX_ROWS=1000000000;

The performance of insert has dropped significantly. Upto 150 million rows in the table, it used to take 5-6 seconds to insert 10,000 rows. Now it has gone up by 2-4 times. Innodb's ibdata file has grown to 107 GB. Innodb configuration parameters are as follows.

innodb_buffer_pool_size = 36G # Machine has 48G memory
innodb_additional_mem_pool_size = 20M
innodb_data_file_path = ibdata1:10M:autoextend
innodb_log_file_size = 50M
innodb_log_buffer_size = 20M
innodb_log_files_in_group=2
innodb_flush_log_at_trx_commit = 1
innodb_lock_wait_timeout = 50
innodb_thread_concurrency = 8
innodb_flush_method = O_DIRECT
expire_logs_days = 4

IO wait time has gone up as seen with top. I have tried changing the flush method to O_DSYNC, but it didn't help. The disk is carved out of hardware RAID 10 setup. In an earlier setup with single disk, IO was not a problem.

Is partitio开发者_JAVA百科ning the table only option? Can splitting single 100G file into "smaller" files help? Are there any variables that need to be tuned for RAID?

Update: This is a test system. I have the freedom to make any changes required.


You didn't say whether this was a test system or production; I'm assuming it's production.

It is likely that you've got the table to a size where its indexes (or the whole lot) no longer fits in memory.

This means that InnoDB must read pages in during inserts (depending on the distribution of your new rows' index values). Reading pages (random reads) is really slow and needs to be avoided if possible.

Partitioning seems like the most obvious solution, but MySQL's partitioning may not fit your use-case.

You should certainly consider all possible options - get the table on to a test server in your lab to see how it behaves.

Your primary key looks to me as if it's possibly not required (you have another unique index), so eliminating that is one option.

Also consider the innodb plugin and compression, this will make your innodb_buffer_pool go further.

You really need to analyse your use-cases to decide whether you actually need to keep all this data, and whether partitioning is a sensible solution.

Making any changes on this application are likely to introduce new performance problems for your users, so you want to be really careful here. If you find a way to improve insert performance, it is possible that it will reduce search performance or performance of other operations. You will need to do a thorough performance test on production-grade hardware before releasing such a change.


From my experience with Innodb it seems to hit a limit for write intensive systems even if you have a really optimized disk subsystem. I am surprised you managed to get it up to 100GB.

This is what twitter hit into a while ago and realized it needed to shard - see http://github.com/twitter/gizzard.

This all depends on your use cases but you could also move from mysql to cassandra as it performs really well for write intensive applications.(http://cassandra.apache.org)


As MarkR commented above, insert performance gets worse when indexes can no longer fit in your buffer pool. InnoDB has a random IO reduction mechanism (called the insert buffer) which prevents some of this problem - but it will not work on your UNIQUE index. The index on (hashcode, active) has to be checked on each insert make sure no duplicate entries are inserted. If the hashcode does not 'follow' the primary key, this checking could be random IO.

Do you have the possibility to change the schema?

Your best bet is to:

(a) Make hashcode someone sequential, or sort by hashcode before bulk inserting (this by itself will help, since random reads will be reduced).

(b) Make (hashcode,active) the primary key - and insert data in sorted order. I am guessing your application probably reads by hashcode - and a primary key lookup is faster.


You didn't mention what your workload is like, but if there are not too many reads or you have enough main-memory, another option is to use a write-optimized backend for MySQL, instead of innodb. Tokutek claims 18x faster inserts and a much more flat performance curve as the dataset grows.

tokutek.com

http://tokutek.com/downloads/tokudb-performance-brief.pdf


Increase from innodb_log_file_size = 50M to innodb_log_file_size = 500M

And the innodb_flush_log_at_trx_commit should be 0 if you bear 1 sec data loss.


I'll second @MarkR's comments about reducing the indexes. One other thing you should look at is increasing your innodb_log_file_size. It increases the crash recovery time, but should help. Be aware you need to remove the old files before you restart the server.

General InnoDB tuning tips: http://www.mysqlperformanceblog.com/2007/11/01/innodb-performance-optimization-basics/

You should also be aware of LOAD DATA INFILE for doing inserts. It's much faster.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜