In SQL Server, is TOP deterministic by default when used on a table with a clustered index?

2023-02-10 07:15 问答作者：

So I was trying to explain to some people why this query is a bad idea:

SELECT z.ReportDate, z.Zipcode, SUM(z.Sales) AS Sales,
COALESCE(
  (SELECT TOP (1) GroupName
  FROM dbo.zipGroups
  WHERE (Zipcode = z.Zipcode)), 'Unknown') AS GroupName,
COALESCE(
  (SELECT TOP (1) GroupCode
  FROM dbo.zipGroups
  WHERE (Zipcode = z.Zipcode)), 0) AS GroupNumber
FROM dbo.Report_ByZipcode AS z
GROUP BY z.ReportDate, z.Zipcode

and suggesting a better way to write it, when my boss ended the discussion with, "Well, it's been ret开发者_开发百科urning the right data for the last year and we haven't had any problems with it, so it's fine."

At which point I thought to myself, how in the world is that even possible?

After some digging, I discovered these facts:

This query is supposed to group sales by Zipcode and date, and link those to the largest Group (by population size) that a Zipcode is assigned to by way of the zipGroups table.
Each Zipcode can be assigned to 0 to many Groups, and if a Zipcode is assigned to 0 Groups, it's simply not in the zipGroups table.
A Group is a geographical area, and the GroupNumbers are ranked by largest to smallest by population (for example, the group covering the NY-NJ-CT tri-state area is GroupNumber 1, and North Platte, Nebraska is GroupNumber 209).
The zipGroups table has not changed in at least 2 years.
The zipGroups table has a clustered index with Zipcode, GroupNumber (ascending) as the keys.
The combination of Zipcode, GroupNumber is unique in zipGroups.

So my question has 2 parts.

A) Even though there are no ORDER BY clauses in those SELECT TOP queries, are they actually deterministic because the clustered index is basically providing it a default ORDER BY?

B1) If that is true, is the query, however precariously, actually doing what it's supposed to do?

B2) If that is not true, can you help me prove it?

Note: I've already re-written this to use joins, so I don't need the SQL to fix it, I need to get it into production so I stop worrying about it breaking.

SQL Server makes no guarantees about the ordering of records in the absence of ORDER BY. It might yield the correct results 999,999 times and then fail on the millionth try. Don't do it.

Always use an order by with a TOP statement. The order is not guaranteed to be in the order of the clustered index as demonstrate in this blog post (complete with a query that disproves it):

Without ORDER BY, there is no default sort order.

Even if it did go by the clustered index, I wouldn't write queries that depend on undocumented behavior of the DB engine and it is better to be explicit for readability.

If you're relying on a clustered index rather than the collation, then getting the right order is coincidental, not deterministic.

In the real world, indexes can be changed from one kind to another, for good reasons, bad reasons, or no reason at all. And, in the real world, you don't necessarily get to choose which index SQL Server will use in executing a query. (Or whether it will use an index at all.)

Technically, the collation can also be changed for good reasons, bad reasons, or no reason at all. But everybody knows changing the collation will change the sort order--that's its job, after all--so it's not a surprise. (Ever heard of "the principle of least surprise"?)

The link by JohnFx is good, although long and hard to follow. Here's a small snippet on it's own that will show the data returning in non-clustered index order.

CREATE TABLE t1 (x INT NOT NULL PRIMARY KEY CLUSTERED, z INT NOT NULL UNIQUE);

INSERT INTO t1 (x,z) VALUES (1,4);
INSERT INTO t1 (x,z) VALUES (3,3);
INSERT INTO t1 (x,z) VALUES (2,2);
INSERT INTO t1 (x,z) VALUES (4,1);

SELECT x, z FROM t1;

Output (you should get)

x           z
----------- -----------
4           1
2           2
3           3
1           4

The execution plan shows it using the Unique (or other) index instead of the clustered index.

Even if the clustered index is chosen, it may not sort correctly if the data is being merged from parallelism, if the TOP N count is high enough.

Having said that, since you are only using TOP(1) and if the table has only one index available, it can be considered deterministic since it will only use that index and pick the first entry in the index pages.

A) Even though there are no ORDER BY clauses in those SELECT TOP queries, are they actually deterministic because the clustered index is basically providing it a default ORDER BY? B1) If that is true, is the query, however precariously, actually doing what it's supposed to do?

When top is specified without ordering, the ordering is a side effect of the method of access chosen by the query optimizer. Since the query optimizer would use the clustered index to resolve this query, you get a quite nice side effect.

I wouldn't use the word deterministic, as the query optimizer might not be deterministic. However in the case where the optimizer choses the clustered index, yes - the query does what it is supposed to do.

ORDER should still be specified, so as to lock the correctness into the query. One should separate correctness ("What do you want") and implementation ("How do you get it") into query and optimizer plan, respectively.

B2) If that is not true, can you help me prove it?

Assuming there are more columns in the ZipGroups table, a Nonclustered index containing the only two relevant columns could be added that would be preferred over the clustered index. If the nonclustered index had a different ordering (Zipcode asc, GroupNumber desc), then the query would break.

继续阅读：sql sql-server

In SQL Server, is TOP deterministic by default when used on a table with a clustered index?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？