开发者

MySQL: Optimizing SELECT from 6 huge identical tables with different data split up by timestamp

please I have the same problem as I found here

MySQL - Selecting data from multiple tables all with same structure but different data ,

I have to select data from many MySQL tables with identical structure, but different data (split up into table_0, table_1, table_2 etc to table_5 to distribute开发者_高级运维 millions of records of data).

The hardware generating the data records for each device moves from table to table according to timestamp field, which is NOT unique. e.g. 50 records in table_0 may have the same timestamp. When the data gets to the end of table_5, it goes back to table_0 to start overwriting the data there. I need to get the data on each device within a time range.

Each table's data columns (for table_0, table_1... up to table_5):

timestamp, robotGroupID, robotID, sensor1, sensor2, sensor3, ... (many of them)

However the tables are HUGE and the UNION ALL (I read its faster than DISTINCT) takes forever to execute, even with just two tables let alone 6. e.g. I will illustrate for two tables below.

MySQL statement in PHP: (illustrated for just sensor 1, sensor 2 and sensor 3)

(SELECT sensor1, sensor2, sensor3 FROM table_0 WHERE robotID=".$robotID." AND timestamp BETWEEN ".$timeStampStart." AND ".$timeStampStop)

UNION ALL

(SELECT sensor1, sensor2, sensor3 FROM table_1 WHERE robotID=".$robotID." AND timestamp BETWEEN ".$timeStampStart." AND ".$timeStampStop)

N.B it is the exact same query except for the table name. Sensor data for a robot within a time range may span none, one, or more of the tables at once.

I cannot use LIMIT because the number of reports from robots within each time range cannot be known ahead of time. I cant use the MERGE STORAGE ENGINE cos I only have read-only access to the company's database.

I have an idea to use count(robotID) or so on each table to check before running queries but Im not sure how to go about this cos I'm quite a novice.

Please how do you think I can make this work faster for 6 tables and many more columns since there are many more columns than illustrated? Thanks in advance!


Are the fields RobotID and Timestamp indexed?

I would add a multi-field index of ( RobotId, timestamp ) at the very least.

You say you have read only access to the tables, so can you request this index to be added? I'm sure it will help in both your original and updated queries posted.


I must confess Im still a novice PHP/MySQL coder, but with many ideas; so my code is probably "dirty".

So I solved the problem this way in order to move forward, but please better solutions are welcome. As for any strange syntax, I am using a database class built upon the PHP PDO because I am using many different RBDMS types on this project.

For the $myQuery_start variable, I added the names of the other columns as well as sensors 1 to 3.
$myQuery_start = "(SELECT sensor1, sensor2, sensor3 FROM ";
$myQueryCount_start = "(SELECT COUNT(*) FROM ";
$myQuery_stop = " WHERE robotID=".$robotID." AND timestamp BETWEEN ".$timeStampStart." AND ".$timeStampStop.")";

$count_0 = DB::getDB("mysql", $myDB)->query($myQueryCount_start."table_0".$myQuery_stop)->fetchColumn();
$count_1 = DB::getDB("mysql", $myDB)->query($myQueryCount_start."table_1".$myQuery_stop)->fetchColumn();
$count_2 = DB::getDB("mysql", $myDB)->query($myQueryCount_start."table_2".$myQuery_stop)->fetchColumn();
$count_3 = DB::getDB("mysql", $myDB)->query($myQueryCount_start."table_3".$myQuery_stop)->fetchColumn();
$count_4 = DB::getDB("mysql", $myDB)->query($myQueryCount_start."table_4".$myQuery_stop)->fetchColumn();
$count_5 = DB::getDB("mysql", $myDB)->query($myQueryCount_start."table_5".$myQuery_stop)->fetchColumn();

And now I check to see if UNION ALL needs to be appended to each table's query or not. No need to have a UNION ALL if there is no data record to attach in the next table.
$union_0 = (($count_1 + $count_2 + $count_3 + $count_4 + $count_5) > 0)?" UNION ALL ":"";
$union_1 = (($count_2 + $count_3 + $count_4 + $count_5) > 0)?" UNION ALL ":"";
$union_2 = (($count_3 + $count_4 + $count_5) > 0)?" UNION ALL ":"";
$union_3 = (($count_4 + $count_5) > 0)?" UNION ALL ":"";
$union_4 = (($count_5) > 0)?" UNION ALL ":"";

and now we build up the table queries and combine to form the full query

$query_0 = ($count_0 > 0)?$myQuery_start."ip_minute_stats_0".$myQuery_stop.$union_0:"";
$query_1 = ($count_1 > 0)?$myQuery_start."ip_minute_stats_1".$myQuery_stop.$union_1:"";
$query_2 = ($count_2 > 0)?$myQuery_start."ip_minute_stats_2".$myQuery_stop.$union_2:"";
$query_3 = ($count_3 > 0)?$myQuery_start."ip_minute_stats_3".$myQuery_stop.$union_3:"";
$query_4 = ($count_4 > 0)?$myQuery_start."ip_minute_stats_4".$myQuery_stop.$union_4:"";
$query_5 = ($count_5 > 0)?$myQuery_start."ip_minute_stats_5".$myQuery_stop:"";

Then concatenated:
$myQuery = $query_0.$query_1.$query_2.$query_3.$query_4.$query_5;
And finally $myQuery is executed to produce all the data as required.

At least this is roughly 8 times faster than the previous way I used UNION ALL, so I think this is valid. Any suggested further optimization?


If you can convince them to let you change the database structure, you can GREATLY optimize the layout of your database with the help of MySQL Partitioning. You'll want to research "Range Partitioning", and set up partitioning rules that will tell MySQL to automatically sort your data into invisible subtables for way quicker SELECT results. You won't even need multiple tables.

See http://dev.mysql.com/doc/refman/5.1/en/partitioning-overview.html

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜