Looking for some help improving the performance of this SQL query
I'm not a real big TSSQL expert. We have a query that returns almost 2 million records. It takes over 2.5 minutes to run. We've added indexes, and that shaved off one minute (used to take 3.5 minutes to run), but 2.5 minutes is still too slow. Can anyone show me how to modify this query to improve it's performance? I suspect the change will have to occur with the "in" statement, and with the way the "LOCATION" value is being calculated, but I'm not sure how to go about doing it.
select distinct
A.ptid,
A.ptmgrx,
A.ptmgry,
A.fename,
B.fename XFENAME,
A.rt1 route,
A.pr,
A.mp,
A.rdbranch,
A.gs grade,
A.cs,
A.csmp,
A.cspath,
(select distinct twpname from tamc.dbo.fipsmdot where fipscode = A.fmcdl) + '; ' +
(select distinct county from tamc.dbo.fipsmdot where fipsco = A.countyl) + ' County, ' +
(select distinct twpname from tamc.dbo.fipsmdot where fipscode = A.fmcdr) + '; ' +
(select distinct county from tamc.dbo.fipsmdot where fipsco = A.countyr) + ' County' LOCATION
from intersectionApproaches A INNER JOIN intersectionApproaches B ON A.ptid = B.ptid
where A.ptid in
(select distinct C.ptid from intersectionApproaches C, intersectionApproaches B
where C.ptid = B.ptid)
Edits: The DB server is MS SQL Server 2008. I have the execution plan but there doesn't seem to be a way to make it visible easily. It saves itself as a .sqlplan file. Not sure if that would work for anyone else. But I can provide some info from it.
29% of the time is spent doing a Distinct sort.
15% of the time is spent doing an "index spool" (it does this three different times)
6% is spent doing a "table spool" (only one time)
8% doing a "Sort"
To answer JNK: The original query was in a stored procedure and took a couple of parameters that limited the result set. Here is the original query with those parameters:
select distinct
A.ptid,
A.ptmgrx,
A.ptmgry,
A.fename,
B.fename XFENAME,
A.rt1 route,
A.pr,
A.mp,
A.rdbranch,
A.gs grade,
A.cs,
A.csmp,
A.cspath,
(select distinct twpname from tamc.dbo.fipsmdot where fipscode = A.fmcdl) + '; ' +
(select distinct county from tamc.dbo.fipsmdot where fipsco = A.countyl) + ' County, ' +
(select distinct twpname from tamc.dbo.fipsmdot where fipscode = A.fmcdr) + '; ' +
(select distinct county from tamc.dbo.fipsmdot where fipsco = A.countyr) + ' County' LOCATION
from intersectionApproaches A INNER JOIN intersectionApproaches B ON A.ptid = B.ptid
where A.ptid in
(select distinct C.ptid from intersectionApproaches C, intersectionApproaches B
where C.ptid = B.ptid
and (C.fename + isnull(' ' + C.fetype,'') like @str + '%' or @str in (C.rt1name, C.rt2name, C.rt3name)) and (B.fename + isnull(' ' + B.fetype,'') like @xstr + '%' or @xstr in (B.rt1name, B.rt2name, B.rt3name)))
Hopefully this helps to explain what happened. I removed the last part of the where clause since I was trying to get a result set that was not filtered by those parameters. I want to move this into a view instead.
More Editing: The table, "IntersectionApproaches" is a table of approaches to an road intersection. For instance, the standard road intersection is composed of 4 approaches (one for each side of the intersection that traffic could approach the intersection from). The original purpose of the stored procedure, was to return a list of intersectionApproaches matching a particular pair of street names.
For instance, you have "Main" and "1st" and they have an intersection.
There are four intersection approaches for this one intersection.
"Main and 1st"
"1st and Main"
"1st and 1st"
"Main and Main"
They all share the same "PTID", which is the intersection ID.
However, the IntersectionApproaches table only has one field 开发者_如何学Cfor the name of the street "FENAME". To make the record complete, we need the name of the corresponding cross-street (hence the "B.Fename XFENAME" in the query). This allows us to query the result and say "get me all records with a street of 'Main' and a cross-street of '1st'." In addition, we also need the "LOCATION" value filled in as defined by the multiple "select distinct" statements in the query, because we might filter on that as well.
I was trying to make this into a view instead of having to supply the cross streets first to get a manageable record set and then filtering again on the Location field. The current stored procedure does nothing to help me with paging or sorting either. I just wanted to get it all into a view so I could work with it more naturally.
Hope this helps...
I count six occurrences of DISTINCT
in your code. That is wayyyyyyyyyyy too many.
Without even seeing your data, the DISTINCT
in the IN
clause at the end is completely unneeded. IN
doesn't care if you have dupes or not in the subquery since it short circuits.
It also looks like your subqueries can all be replaced by a JOIN
.
Right now you are querying fipsmdot
five times per row from your main table.
You also have what appears to be a completely unnecessary INNER JOIN
to the main table (self-join).
To get more details you will need to post some sample data, table structure, and what you are trying to get at.
Three Four things leap out:
Strange use of IN. This can be simplified because A.ptid = B.ptid = C.ptid- Subqueries per row: change to JOINs
- Why JOIN intersectionApproaches twice on the key? This forces the unnecessary outer DISTINCT
- (Thanks to Bill) Pointless WHERE
Try this to remedy some of the issues
select distinct
A.ptid,
A.ptmgrx,
A.ptmgry,
A.fename,
B.fename XFENAME,
A.rt1 route,
A.pr,
A.mp,
A.rdbranch,
A.gs grade,
A.cs,
A.csmp,
A.cspath,
a1.twpname '; ' +
a2.twpname + ' County, ' +
a3.twpname + '; ' +
a4.twpname + ' County' LOCATION
from
intersectionApproaches A
INNER JOIN
intersectionApproaches B ON A.ptid = B.ptid
JOIN
(select distinct twpname from tamc.dbo.fipsmdot) a1 ON a1.fipscode = A.fmcdl
JOIN
(select distinct twpname from tamc.dbo.fipsmdot) a2 ON a2.fipsco = A.countyl
JOIN
(select distinct twpname from tamc.dbo.fipsmdot) a3 ON a3.fipscode = A.fmcdr
JOIN
(select distinct twpname from tamc.dbo.fipsmdot) a4 ON a4.fipsco = A.countyr
/*
not needed as Bill said. Always true
where
EXISTS (SELECT *
FROM intersectionApproaches C
WHERE C.ptid = A.ptid)
*/
I am posting this here as an answer so others looking in the future will see this if needed.
I was able to use the info from @gbn to create a query that gave me the info I needed without the nested selects.
Here is the new query:
select
DISTINCT
A.ptid,
A.ptmgrx,
A.ptmgry,
A.fename,
B.fename XFENAME,
A.rt1 route,
A.pr,
A.mp,
A.rdbranch,
A.gs grade,
A.cs,
A.csmp,
A.cspath,
c1.twpname + '; ' +
c2.county + ' County, ' +
c3.twpname + '; ' +
c4.county + ' County' LOCATION,
A.rt1name,
A.rt2name,
A.rt3name,
A.fetype,
B.rt1name AS Xrt1name,
B.rt2name AS Xrt2name,
B.rt3name AS Xrt3name,
B.fetype AS Xfetype
from dbo.intersectionApproaches A
INNER JOIN intersectionApproaches B ON A.ptid = B.ptid
JOIN
(select distinct twpname, fipscode from tamc.dbo.fipsmdot) c1 ON c1.fipscode = A.fmcdl
JOIN
(select distinct county, fipsco from tamc.dbo.fipsmdot) c2 ON c2.fipsco = A.countyl
JOIN
(select distinct twpname, fipscode from tamc.dbo.fipsmdot) c3 ON c3.fipscode = A.fmcdr
JOIN
(select distinct county, fipsco from tamc.dbo.fipsmdot) c4 ON c4.fipsco = A.countyr
It still takes a while to run, but I guess it's the best I can do. As long as I filter the result set on something like ptid or fename, the performance is acceptable (2-3 seconds). Running the whole thing without any filters results in a query that still takes over 2.5 mins to run, but I don't think we'll be using it in that manner very often (if at all).
Thanks to all for your input and time. I think all the answers so far have led me to being able to come up with this improvement.
精彩评论