MYSQL table becoming large

2023-02-28 01:40 问答作者：

I have a table in which approx 100,000 rows are added every day. I am supposed to generate reports from开发者_如何学编程 this table. I am using PHP to generate these reports. Recently the script which used to do this is taking too long to complete. How can I improve the performance by shifting to something else than MYSQL which is scalable in the long run.

MySQL is very scalable, that's for sure.

The key is not changing the db from Mysql to other but you should:

Optimize your queries (can sound silly for others but I remember for instance that a huge improvment I've done sometime ago is to change SELECT * into selecting only the column(s) I need. It's a frequent issue I meet in others code too)
Optimize your table(s) design (normalization etc).
Add indexes on the column(s) you are using frequently in the queries.

Similar advices here

For generating reports or file downloads with large chunks of data you should concider using flush and increasing time_limit and memory limit.

I doubt the problem lies in the amount of rows, since MySQL can support ALOT of rows. But you can of course fetch x rows a time and process them in chunks.

I do assume your MySQL is properly tweaked for performance.

First analyse why (or: whether) your queries are slow: http://dev.mysql.com/doc/refman/5.1/en/using-explain.html

You should read the following and learn a little bit about the advantages of a well designed innodb table and how best to use clustered indexes - only available with innodb !

The example includes a table with 500 million rows with query times of 0.02 seconds.

MySQL and NoSQL: Help me to choose the right one

Hope you find this of interest.

Another thought is to move records beyond a certain age to a historical database for archiving, reporting, etc. If you don't need that large volume for transactional processing it might make sense to extract them from the transactional data store.

It's common to separate transactional and reporting databases.

I am going to make some assumptions

Your 100k rows added every day have timestamps which are either real-time, or are offset by a relatively short amount of time (hours at most); your 100k rows are added either throughout the day or in a few big batches.
The data are never updated
You are using InnoDB engine (Frankly you would be insane to use MyISAM for large tables because in the event of a crash, index rebuild takes a prohibitive time)

You haven't explained what kind of reports you're trying to generate, but I'm assuming that your table looks like this:

 CREATE TABLE logdata (
   dateandtime some_timestamp_type NOT NULL,
   property1 some_type_1 NOT NULL,
   property2 some_type_2 NOT NULL,
   some_quantity some_numerical_type NOT NULL,
   ... some other columns not required for reports ...
   ... some indexes ...

 );

And that your reports look like

SELECT count(*), SUM(some_quantity), property1 FROM logdata WHERE dateandtime BETWEEEN some_time_range GROUP BY property1;
SELECT count(*), SUM(some_quantity), property2 FROM logdata WHERE dateandtime BETWEEEN some_time_range GROUP BY property2;

Now, as we can see, both of these reports are doing a scan of a large amount of the table, because you are reporting on a lot of rows.

The bigger the time range becomes the slower the reports will be. Moreover, if you have a lot of OTHER columns (say some varchars or blobs) which you aren't interested in reporting on, then they slow your report down too (because the server still needs to inspect the rows).

You can use several possible techniques for speeding this up:

Add covering index for each type of report, to support the columns you need and omit columns you don't. This may help a lot but slow inserts down.
Summarise data according to the dimension(s) that you want to report on. In this ficticious case, all your reports are either counting rows, or SUM()ing some_quantity.
Build mirror tables (containing the same data) which have appropriate primary keys / indexes/ columns to make the reports faster.
Use a column engine (e.g. Infobright)

Summarisation is usually an attractive option if your use-case supports it;

You may wish to ask a more detailed question with an explanation of your use-case.

The time limit can be temporarily turned off for a particular file if you know that it is going to potentially run over the time limit by calling set_time_limit (0); at the start of your script.

Other considerations such as indexing or archiving very old data to a different table should also be looked at.

Your best bet is something like MongoDB or CouchDB, both of which are non-relational databases oriented toward storing massive amounts of data. This is assuming that you've already tweaked your MySQL installation for performance and that your situation wouldn't benefit from parallelization.

继续阅读：performance php scalability

MYSQL table becoming large

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？