开发者

Running an SQL query faster

   SELECT projectID, urlID, COUNT(1) AS totalClicks, projectPage,
   (SELECT COUNT(1)
     FROM   tblStatSessionRoutes, tblStatSessions
     WHERE  tblStatSessionRoutes.statSessionID = tblStatSessions.ID
     AND    tblStatSessions.projectID = tblAdClicks.projectID
     AND    (tblStatSessionRoutes.leftPageID = tblAdClicks.projectPage OR
           tblStatSessionRoutes.rightPageID = tblAdClicks.projectPage)) AS totalViews
   FROM   tblAdClicks
   WHERE  projectID IN (SELECT projectID FROM tblProjects WHERE userID = 5)
   GROUP  BY projectID, urlID, projectPage
   ORDER  BY CASE projectID
          WHEN 170 THEN
           1
          ELSE
           0
      END, projectID

This is by no means an especially complex query, but because the database is normalised to a good level, and we are dealing with a significant amount of data, this query can be quite slow for the user.

Does anyone have tips on how to improve the speed of it? If I strategically denormalise parts of the database would this help? Will running it in a stored proc offer significant improvements?

The way I handle the data is efficient in my code, the bottleneck really is with this quer开发者_运维问答y.

Thanks!


De-normalising your database should be a last resort since (to choose just one reason) you don't want to encourage data inconsistencies which de-normalisation will allow.

First thing is to see if you can get some clues from the query execution plan. It could be, for example, that your sub-selects are costing too much, and would be better done first into temp tables which you then JOIN in your main query.

Also, if you see lots of table-scans, you could benefit from improved indexes.

If you haven't already, you should spend a few minutes re-formatting your query for readability. It's amazing how often the obvious optimisation will jump out at you while doing this.


I would try to break up that

projectID IN (SELECT projectID FROM tblProjects WHERE userID = 5)

and use a JOIN instead:

 SELECT 
     projectID, urlID, COUNT(1) AS totalClicks, projectPage,
     (SELECT COUNT(1) ....) AS totalViews
 FROM
     dbo.tblAdClicks a
 INNER JOIN 
     dbo.tblProjects p ON a.ProjectID = p.ProjectID
 WHERE 
     p.UserID = 5
 GROUP BY 
     a.projectID, a.urlID, a.projectPage
 ORDER BY 
     CASE a.projectID
        WHEN 170 THEN 1
        ELSE 0
     END, a.projectID

Not sure just how much this will help - should help a bit, I hope!

Other than that, I would check if you have indices on the relevant columns, e.g. on a.ProjectID (to help with the JOIN), and maybe on a.urlID and a.ProjectPage (to help with the GROUP BY)


If your dbms has a tool that explains its query plan, use that first. (Your first correlated subquery might be running once per row.) Than make sure every column referenced in a WHERE clause has an index.

This subquery--WHERE projectID IN (SELECT projectID FROM tblProjects WHERE userID = 5)--can surely benefit from being cut and implemented as a view. Then join to the view.

It's not unusual to treat clickstream data as a data warehouse application. If you need to go that route, I'd usually implement a separate data warehouse rather than denormalize a well-designed OLTP database.

I doubt that running it as a stored proc will help you.


I would try to remove the correlated subquery (the inner (SELECT COUNT(1) ...)). Having to join against your session routes where either the left page or the right page matches makes things a bit tricky. Something along the lines of (but I haven't tested this):

SELECT tblAdClicks.projectID, tblAdClicks.urlID, COUNT(1) AS totalClicks, tblAdClicks.projectPage,
       SUM(CASE WHEN leftRoute.statSessionID IS NOT NULL OR rightRoute.statSessionID IS NOT NULL THEN 1 ELSE 0 END) AS totalViews
FROM tblAdClicks
     JOIN tblProjects ON tblProjects.projectID = tblAdClicks.projectID
     LEFT JOIN tblStatSessions ON tblStatSessions.projectID = tblAdClicks.projectID
     LEFT JOIN tblStatSessionRoutes leftRoute ON leftRoute.statSessionID = tblStatSessions.ID AND leftRoute.leftPageID = tblAdClicks.projectPage
     LEFT JOIN tblStatSessionRoutes rightRoute ON rightRoute.statSessionID = tblStatSessions.ID AND rightRoute.rightPageID = tblAdClicks.projectPage
WHERE tblProjects.userID = 5
GROUP BY tblAdClicks.projectID, tblAdClicks.urlID, tblAdClicks.projectPage
ORDER BY CASE tblAdClicks.projectID WHEN 170 THEN 1 ELSE 0 END, tblAdClicks.projectID

If I were to add some cache tables to help this, as I indicated I'd try to reduce the two queries against tblStatSessionRoutes for both left and right page to a single query. If you know that leftPageID will never be equal to rightPageID, it should be possible to simply use a trigger to populate an additional table with the left and right views in separate rows, for example.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜