开发者

MYSQL table becoming large

I have a table in which approx 100,000 rows are added every day. I am supposed to generate reports from开发者_如何学编程 this table. I am using PHP to generate these reports. Recently the script which used to do this is taking too long to complete. How can I improve the performance by shifting to something else than MYSQL which is scalable in the long run.


MySQL is very scalable, that's for sure.

The key is not changing the db from Mysql to other but you should:

  1. Optimize your queries (can sound silly for others but I remember for instance that a huge improvment I've done sometime ago is to change SELECT * into selecting only the column(s) I need. It's a frequent issue I meet in others code too)
  2. Optimize your table(s) design (normalization etc).
  3. Add indexes on the column(s) you are using frequently in the queries.

Similar advices here


For generating reports or file downloads with large chunks of data you should concider using flush and increasing time_limit and memory limit.

I doubt the problem lies in the amount of rows, since MySQL can support ALOT of rows. But you can of course fetch x rows a time and process them in chunks.

I do assume your MySQL is properly tweaked for performance.


First analyse why (or: whether) your queries are slow: http://dev.mysql.com/doc/refman/5.1/en/using-explain.html


You should read the following and learn a little bit about the advantages of a well designed innodb table and how best to use clustered indexes - only available with innodb !

The example includes a table with 500 million rows with query times of 0.02 seconds.

MySQL and NoSQL: Help me to choose the right one

Hope you find this of interest.


Another thought is to move records beyond a certain age to a historical database for archiving, reporting, etc. If you don't need that large volume for transactional processing it might make sense to extract them from the transactional data store.

It's common to separate transactional and reporting databases.


I am going to make some assumptions

  • Your 100k rows added every day have timestamps which are either real-time, or are offset by a relatively short amount of time (hours at most); your 100k rows are added either throughout the day or in a few big batches.
  • The data are never updated
  • You are using InnoDB engine (Frankly you would be insane to use MyISAM for large tables because in the event of a crash, index rebuild takes a prohibitive time)

You haven't explained what kind of reports you're trying to generate, but I'm assuming that your table looks like this:

 CREATE TABLE logdata (
   dateandtime some_timestamp_type NOT NULL,
   property1 some_type_1 NOT NULL,
   property2 some_type_2 NOT NULL,
   some_quantity some_numerical_type NOT NULL,
   ... some other columns not required for reports ...
   ... some indexes ...

 );

And that your reports look like

SELECT count(*), SUM(some_quantity), property1 FROM logdata WHERE dateandtime BETWEEEN some_time_range GROUP BY property1;
SELECT count(*), SUM(some_quantity), property2 FROM logdata WHERE dateandtime BETWEEEN some_time_range GROUP BY property2;

Now, as we can see, both of these reports are doing a scan of a large amount of the table, because you are reporting on a lot of rows.

The bigger the time range becomes the slower the reports will be. Moreover, if you have a lot of OTHER columns (say some varchars or blobs) which you aren't interested in reporting on, then they slow your report down too (because the server still needs to inspect the rows).

You can use several possible techniques for speeding this up:

  • Add covering index for each type of report, to support the columns you need and omit columns you don't. This may help a lot but slow inserts down.
  • Summarise data according to the dimension(s) that you want to report on. In this ficticious case, all your reports are either counting rows, or SUM()ing some_quantity.
  • Build mirror tables (containing the same data) which have appropriate primary keys / indexes/ columns to make the reports faster.
  • Use a column engine (e.g. Infobright)

Summarisation is usually an attractive option if your use-case supports it;

You may wish to ask a more detailed question with an explanation of your use-case.


The time limit can be temporarily turned off for a particular file if you know that it is going to potentially run over the time limit by calling set_time_limit (0); at the start of your script.

Other considerations such as indexing or archiving very old data to a different table should also be looked at.


Your best bet is something like MongoDB or CouchDB, both of which are non-relational databases oriented toward storing massive amounts of data. This is assuming that you've already tweaked your MySQL installation for performance and that your situation wouldn't benefit from parallelization.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜