Which query seems efficient?
I want to know which of the 2 queries below is faster :-
Select s.*,
sm.*
from tblStudent s
Inner Join (SELECT studentId,SUM(marks) As Sum
from tblStudentsMarks
Group By studentId) as sm on s.StudentID = sm.StudentID;
...or:
Select s.studentId,
s.Name,
SUM(Marks)
From tblStudent s
Inner Join tblStudentsMarks sm On s.Studentid = sm.StudentId
Group By s.studentId, s.Name;
EDIT :-
Query Estimation of 1st Qu开发者_运维知识库ery :- http://img28.imageshack.us/img28/166/1stpicd.jpg
Query Estimation of 2nd Query :- http://img245.imageshack.us/img245/5064/2ndpic.jpg
Thanks in advance :)
You can run both of them and use execution plan to compare each of them.
After reviewing the Execution Plans, the first query is more efficient.
Both have a table scan on tblStudentMarks
, but the percentage is much lower on the first query compared to the second one. The SORT isn't ideal, but being that the table scan is in both - probably easier to deal with by adding an index. Both the SORT and table scan could be improved by reviewing the indexes...
Depends on your data and indexes. Generally, SQL Server is said to be better at optimizing joins than subqueries, and since it's also more readable (and thus more maintainable and less likely to cause bugs), I'd go with option 2 (join) for now, and see if you hit any performance roadblocks. If this is a speed critical query, I'd try both and compare the results. Make sure you use realistic data for testing though.
Your example execution plans seem somewhat unrealistic. You have no useful indexes at all on tblStudentsMarks
and I suspect can't have added many rows at all to either of the two tables. Setting up a possibly more realistic test
CREATE TABLE tblStudent(
studentId INT identity(1,1) primary key,
Name varchar(50),
filler char(2000));
create nonclustered index ix on tblStudent (StudentId, Name);
Insert into tblStudent (Name)
select top 10000 newid()
from sys.objects s1,sys.objects s2,sys.objects s3,sys.objects s4;
CREATE TABLE tblStudentsMarks(
examid int not null
,studentId INT foreign key references tblStudent not null
,marks decimal(5,2) not null
,primary key clustered (studentId, examid))
insert into tblStudentsMarks
select abs(checksum(newid())) as examid, studentId ,abs(checksum(newid()))/10000000 as marks
from tblStudent cross join (select top 5 1 C from sys.objects) d
where studentid % 3 =0
Gives a different conclusion
If we get rid of the *
in Query 1 and replace it with Select s.studentId, s.Name, sm.Sum
both of the query plans turn out exactly the same.
One reason your second query is better is because you are not using the *
wild-card.
explaination to grok
To speed up, don't use Select *. Try also Left Join instead on Join (only if produce the same result)
Choose the one that is faster on your production server (depending on the data volume, index fragmentation and index selectivity the query optimizer can choose a different plan on production compared with the test server). When you test don't forget to empty the cache between runs. Test on real amount of data because the plan can change in time.
In real life ( if is fast enough ) I will go always for the second solution because is more readable and maintainable.
I know this is kind of extreme but I have a similar query that I run against some very large tables. Assuming you have a clustered index on tblStudent.studentId this might give you the best results. SQL likes joins on clustered indexes. Oh I wish TSQL would support contraints on derived tables so did not have to resort to #temp. In my case the derived table (your first ran in 18 seconds). You second approach ran in 20 some seconds (on my data). And the #temp below with a clustered index to clustered index ran in less than 1 seconds. This is on my data with millions of row. If your query is running in less the 1/10 second then it is not worth the overhead of a #temp.
CREATE TABLE #sumMarks ([StudentId] [int] NOT NULL, PRIMARY KEY CLUSTERED ([StudentId] ASC), [MarksSum] [int])
GO
INSERT INTO #sumMarks
SELECT [studentId], sum(Marks) as [MarksSum]
FROM [tblStudentsMarks] with (nolock)
group by [tblStudentsMarks].[studentId]
ORDER by [tblStudentsMarks].[studentId];
go
Select s.studentId, s.Name, sm.[MarksSum]
From tblStudent s
Inner Join #sumMarks as sm On s.Studentid = sm.StudentId;
go
DROP TABLE #sumMarks;
精彩评论