开发者

php mysql optimization

I have been assigned the task of creating some graphical stats for a website, out of some saved data.

Facts: - there are 3 databases in use. dbCurrent, dbStats, dbBackup. dbCurrent is the main database of the website dbStats hold various tables of statistics and tracking data dbBackup holds the last five years stats/tracking tables.

  • the data i will use come from two databases ( dbStats, dbBackup )
  • the table names are: stats2006, stats2007, stats2008, etc, except the current stats which is just "stats". Each table has data for its year.
  • the tabl开发者_开发知识库e structure for each year of data is the same: primaryID field is integer productID field is integer dateMonitor field is integer (unixtimestamp) pageName field is varchar (20)
  • productID, dateMonitor, pageName fields have also indexes

in other words, what product was viewed on what date and from what page.

So, what i thought is create a loop out of each table and get my data. Each query of mine looks like:

Select COUNT(primaryID) as myCounter FROM $tablename WHERE $conditions

where $tablename and $conditions are variables based on each loop. All conditions are similar to:

  • dateMonitor between date1 and date2
  • pageName='some val'
  • productID IN ($comma_separated_values)
  • combination of the above

All of these are working decently so far (for a single product).

When i try to create a report to compare 'x' products in 'y' years (chosen dynamically from admin/moderator) the script runs for more than 15 minutes.

I am looking for a way to improve the performance of the script. Logic/structure i use so far, follows:

Loop through products to find the ids to use (typical format is: x,y,z (comma separated values)
Open Loop through years/months
Execute one sql query for each affected table/database to get the number of affected rows.
Close year loop
Send data to graph script (jquery jqPlot to be exact) to print on screen

Any help/idea appreciated.

EDIT: Based on @Narf suggestion's with UNION ALL, i constructed 1 single query based on 12 sub-select statements:

SELECT COUNT(*) AS monthlyTotal FROM db1.table1 WHERE dateMonitor>='1167606001' AND dateMonitor<='1170284399' AND dateMonitor='test' 
UNION ALL 
SELECT COUNT(*) AS monthlyTotal FROM db1.table2 WHERE dateMonitor>='1170284401' AND dateMonitor<='1172703599' AND dateMonitor='test' ...

Each select statement refers to a single month duration. Demo code:

for ($m=1; $m<=12; $m++)
{
$startDate = mktime(0, 0, 1, $m, 1, $myYear);
$daysOfMonth = date("t", mktime(10, 10, 10, $m, 10, $myYear));
$endDate = mktime(23, 59, 59, $m, $daysOfMonth, $myYear);

$query_chk1 .= "SELECT COUNT(*) AS monthlyTotal FROM db1.table1 WHERE dateMonitor>='$startDate' AND dateMonitor<='$endDate' AND pageName='test' UNION ALL ";
}

$query_chk1 = substr($query_chk1, 0, -10);

EDIT2: after creating combined indexes (as suggested by @ypercube), I see some slight decreased time in execution time.

Now i have an average execution time of 11 min (original time was 15-17 min)

This helped a lot to decrease execution time.

Thank you.


There's not much that you can do, at least since you've indexed all your columns ... here's the best that I can come up with:

SELECT COUNT(*)
FROM `stats`
WHERE `productID IN(1,2,3)
    AND `dateMonitor` >= <unixtime from>
    AND `dateMonitor` <= <unixtime to>
    AND `pageName`='<value>'

... and how:

  • As ypercube has commented - using COUNT(*) is faster.
  • I don't know this for sure, but I believe that using >= and <= instead of BETWEEN for integers should be faster.

Another thing that you should try is executing all the queries (if more than one) at once. It would be harder for me to explain it correctly in words, and I see that you have a good grasp of SQL, so you should be able to get the logic, so here's an example:

Let's say that we need to search for products with ids of 123, 13, 5 and 6 from May 2006 through April 2008, and pageName 'test':

  • We calculate the timestamps prior to generating the query and determine exactly which tables we need to search in.

    SELECT COUNT(*) AS myCounter FROM stats2006 WHERE productID IN(5,6,13,123) AND dateMonitor >= 1146430800 AND pageName='test'

    /* Here we only need to check the timestamp against May 1st 2006, 00:00:00 */

    UNION ALL

    SELECT COUNT(*) AS myCounter FROM stats2007 WHERE productID IN(5,6,13,123) AND pageName='test'

    /* Here we don't need to check the dateMonitor field because the whole year matches our period */

    UNION ALL

    SELECT COUNT(*) AS myCounter FROM stats2008 WHERE productID IN(5,6,13,123) AND dateMonitor <= 1209589199 AND pageName='test'

    /* Here we only need to check the timestamp against April 30th 2008, 23:59:59 */


When you so compare 'x' products in 'y' years why don't you use GROUP BY? Eg:

Select productID, COUNT(primaryID) as myCounter FROM $tablename WHERE $conditions GROUP BY productID

this will cut amount of quires and should speed up the process.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜