开发者

SQL Sub-query or INNER-JOIN?

I've the two following queries:

declare @UserId as int
set @UserId = 1

-- Query #1: Sub-query
SELECT
    u.[Id] ,
    u.[Name] ,
    u.[OrgId] AS Organization,
    (SELECT o.[Name] FROM Org o WHERE o.Id = u.OrgId) As OrganizationName,
    [UserRoleId] AS UserRole,
    [UserCode] AS UserCode,
    [EmailAddress] As EmailAddress, 
    (SELECT SearchExpression FROM SearchCriteria WHERE UserId = @UserId AND IsDefault=1 ) AS SearchCriteria,
    (SELECT PageSize FROM UserPreferences WHERE UserId = @UserId) AS UserPreferencePageSize,
    (SELECT DrilldownPageSize FROM UserPreferences WHERE UserId = @UserId) AS UserPreferenceDrilldownPageSize
    FROM [User] as u
WHERE u.Id = @UserId

-- Query #2: LEFT OUTER JOIN-query
SELECT
    u.[Id] ,
    u.[Name] ,
    u.[OrgId] AS Organization,
    (SELECT o.[Name] FROM Org o WHERE o.Id = u.OrgId) As OrganizationName,
    [UserRoleId] AS UserRole,
    [UserCode] AS UserCode,
    [EmailAddres开发者_开发技巧s] As EmailAddress, 
    sc.SearchExpression As SearchExpression,
    up.PageSize As PageSize,
    up.DrilldownPageSize As DrilldownPageSize    
    FROM [User] as u
LEFT OUTER JOIN [UserPreferences] as up ON u.id = up.UserId
LEFT OUTER JOIN [SearchCriteria] as sc ON u.id = sc.UserId
    WHERE ISNULL(sc.IsDefault,1)=1 AND u.Id = @UserId

Query execution plan statistics: (Query cost relative to batch)

  • Query#1 (Sub-Query) : 56%
  • Query#2 (JOIN) : 44%

I thot the sub-query would be optimal because the sub-query will be executed after the WHERE filter is applied. The statistics say the Query#2 - JOIN approach is better.

Pls suggest. Also as a moderate SQL-Server user how can I derive which query is better (anything other then execution-plan, if it is more helpful)

Thank you.


join is faster than subquery.

subquery makes for busy disk access, think of hard disk's read-write needle(head?) that goes back and forth when it access: User, SearchExpression, PageSize, DrilldownPageSize, User, SearchExpression, PageSize, DrilldownPageSize, User... and so on.

join works by concentrating the operation on the result of the first two tables, any subsequent joins would concentrate joining on the in-memory(or cached to disk) result of the first joined tables, and so on. less read-write needle movement, thus faster


The best thing you can do is try both and compare what gives you the best performance. It's difficult to second guess what the query optimiser will do (you could write 2 different queries that actually end up being optimised to the same execution plan).

To compare performance fairly, you should ensure you try them from a level playing field by clearing down the execution plan and data cache before trying each one. This can be done using the following commands, though only do this on a development/test db server:

DBCC FREEPROCCACHE
DBCC DROPCLEANBUFFERS

The approach I usually take is to run each query 3 times, with SQL Profiler running so I can monitor the duration, reads, CPU and writes of the query which I then base my decision on.

e.g.
1) clear cache using above commands
2) run query and record stats
3) clear cache
4) run query again
5) run query again (this will use cached execution plan/data)

Then repeat for the second query to compare.


The relative cost of an execution plan isn't always a reliable indicator of performance.

I assume from your sql that only 1 row should be returned. Providing that the UserId is a unique key on User, then the performance of your 2 approaches will be similar on most relational databases.

Things to bear in mind would be:

  • if UserPreferences or SearchCriteria return more than 1 row, the first approach will raise an sql error, the second approach will return more than 1 row.
  • the apparent extra lookup in the first approach (UserPreferences selected twice) has no real effect because for the second lookup the record will already be in a buffer
  • if for some reason the User table is tablespace scanned, the first approach will be much faster


It will depend largely on the cardinality of your data: if your in-line lookups are minimal compared to the overhead of joining vast amounts of data (when you only need to extract a small subsection from that join result), then the inline option will be quicker. But if you are having substantial overhead with in-line selects (i.e. if your result has a large number of rows, and you are calling out an inline select for each and every row), then the join will be quicker.

I can't see from yourquestion the numbers involved (i.e. how many rows) so it's hard to make qualitative comment.

For example, if your result set has 10 rows, then the inline selects will be carried out for each of those ten rows only, whereas the join might be involving far more rows, which are then selectively reduced by WHERE clauses. But if you have a resultset of 10 million rows, the inline selects will most probably kill performance, since its row-by-row.

EXAMPLE: imagine you have to collect a load of bricks (specified by size etc.) from all over a building yard and paint them blue.

inline select = Selecting all the bricks you need and then painting them by hand.

join = dump all the bricks into a huge bucket of paint, anf then choose the ones you need

If you only want to end up with 10 bricks, it is far quicker to select and then paint by hand. If you want a million bricks, then mass-painting them in a tub first is the way to go.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜