Increase SQL Query Performance
I have two tables. In one table we enter all types of Models, each model with around 100 rows. The second table has sales data about the first item. I need to produce a result like this:
Date Model Total(WE BOUGHT) Sold
---------- ----- ---------------- ----
2011-01-21 M34R 300 200
2011-01-21 M71S 250 22
My query looks like this:
select distinct
CONVERT(varchar(10),x.Scantime,120) as ScanDate,
x.ModelNumber,
( Select count(*)
from micro_model z
where
z.ModelNumber=x.ModelNumber
and CONVERT(varchar(10),z.scantime,101)
= CONVERT(varchar(10),x.Scantime,101)
) as Total,
( select COUNT(*)
from
micro_Model m
inner join micro_model_sold y on m.IDNO=y.IDNO
where
CONVERT(varchar(10),m.scantime,101)
= CONVERT(varchar(10),x.Scantime,101)
and x.ModelNumber开发者_运维知识库=m.ModelNumber
) as Sold
from maxis.dbo.maxis_IMEI_Model x
where
CONVERT(varchar(10),x.scantime,101) between '01/01/2011' and '01/25/2011'
I am able to achieve that from the above query but it is taking more than 2 minutes to execute. Please suggest how I can improve the performance. I have heard about pivot tables and indexed views but have never done them.
There are a very many things going on in your query that could be causing problems. There are also some areas of uncertainty that should probably be ironed out. For starters, try out this query:
SELECT
DateAdd(Day, DateDiff(Day, 0, X.ScanTime), 0) ScanDate,
X.ModelNumber,
Coalesce(Z.Total, 0) Total,
Coalesce(Z.Sold, 0) Sold
FROM
maxis.dbo.maxis_IMEI_Model X
LEFT JOIN (
SELECT
Z.ModelNumber,
DateAdd(Day, DateDiff(Day, 0, Z.ScanTime), 0) ScanDate,
Count(DISTINCT M.IDNO) Total,
Count(Y.IDNO) Sold
FROM
micro_model Z
LEFT JOIN micro_model_sold Y
ON Z.IDNO = Y.IDNO
GROUP BY
DateDiff(Day, 0, Z.ScanTime),
Z.ModelNumber
) Z
ON X.ModelNumber = Z.ModelNumber
AND X.ScanTime >= Z.ScanDate
AND X.ScanTime < Z.ScanDate + 1
WHERE
X.ScanTime >= '20110101'
AND X.ScanTime < '20110126'
Converting to character in order to do whole date comparisons (by chopping off the characters that represent the time) is very inefficient. The best practice way is to do as I have shown in the WHERE clause. Notice that I incremented the final date by one day, then made that point exclusive using less-than instead of less-than-or-equal-to (which is what BETWEEN does). All the joins also needed to change. Finally, when it is necessary to remove the time portion of a date, the DateDiff method I show here is best (there is a slightly faster method that is much harder to understand so I can't recommend it, but if you're using SQL Server 2008 you can just do
Convert(date, DateColumn)
which is the fastest of all).Using the date format '01/01/2011' is not region safe. If your query is ever used on a machine where the language is changed to one that has a default date format of DMY, your dates will be interpreted incorrectly, swapping the month and day and generating errors. Use the format
yyyymmdd
to be safe.Using correlated subqueries (your SELECT statements inside parentheses to pull in column values from other tables) is awkward and in some cases yields very bad execution plans. Even though the optimizer can often convert these to proper joins, there is no guarantee. It also becomes very hard for other people looking at the query to understand what it is doing. It is better to express such things using outer joins as shown. I converted the correlated subqueries to derived tables.
Using
DISTINCT
is troubling. Your query shouldn't be returning multiple rows for each model. If so, something is logically wrong with the query and you're probably getting incorrect data.I think I've combined the two correlated subqueries in my derived tables correctly. But I don't have example data and all the schema information so it is my best guess. In any case, my query should give you ideas.
I completely reformatted your query because it was nearly impossible to see what it was doing. I encourage you to do a little more formatting in your own code. This will aid you and anyone who comes after you to understand what is going on much more quickly. If you ask more SQL questions on this site you need to format your own code better. Please do so and also use the "code block" button or simply indent all the code lines by 4 spaces manually so it will get formatted by the web page as a code block.
You know, staring at my query a little more, it's clear that I don't understand the relationship between maxis_IMEI_Model
and the other tables. Please explain a bit more what the tables mean and what result you want to see.
It's possible the problems in my query can be solved with a simple GROUP BY
and throwing some SUM
s on the number columns, but I am not 100% sure. It may be that the maxis_IMEI_Model
table needs to go away completely, or to move into its own derived table where it is grouped separately before being joined.
I'm not a SQL expert by any means, but you've got a lot of conversions in there. Why? Why do you need to convert these datetime columns (which is the type I'm assuming for scantime
etc) into strings before comparing them?
I strongly suspect that the conversions are removing any benefit you're getting from what indexes you've got present. (You do have indexes for all the columns involved in the join, right?) In fact, both of your joins look to me like they should be joins on multiple columns without any where clauses... although I'd expect the query optimizer to treat them equivalently if possible.
Look at each and every conversion, and check whether you really need it. I suspect you don't actually need any of them - and the final "between" may even be doing the wrong thing at the moment, given that you're converting into a non-sortable format.
In general - not even just in SQL - it's always worth trying to deal with data in its natural form wherever possible. You're dealing with dates/times - so why treat them as strings for comparison? Conversions are a source of both performance problems and correctness problems.
精彩评论