In SQL Server, what performs better: where ProcessedDate is null or where Processed = 0
I am trying to decide on which approach to take in a database I am designing.
I will be adding a ProcessedDate datetime null
column in a table. It will be nullable for when the record has not been pr开发者_高级运维ocessed.
So, is it worth having a Processed bit not null default 0
column as well?
With the following queries:
select * from tablename where ProcessedDate is null
and
select * from tablename where Processed = 0
All things being equal*, what is the performance difference between the two versions?
*: Appropriate indices are applied to the table in each version. I am not looking for advice on what indices to create. I only want information about the performance of the filter as applied to a single row. If all rows in the table need to be scanned, or a seek is done is irrelevant to the question at hand.
I know that an argument could be made that having the Processed flag is more explicit and therefore more readable, however it also raises issues around keeping the columns in sync (which could be handled by using a computed column). Anyway, I want to limit the scope of this question to the performance side of things.
They are identical as far as I can tell with equal sets of data. You can run my example below and look at the execution plans to check. You'll see that the filter element where it tests the predicate takes up the same proportion of time in each. Even if indexes are added it wouldn't make one more efficient than the other as they are basically doing the same thing.
WITH Test (MyInt, MyNull)
AS
(
SELECT 1 AS MyInt, Null AS MyNull
UNION ALL
SELECT
MyInt + 1,
CASE
WHEN MyInt % 2 = 0 THEN Null
ELSE MyInt
END
FROM Test
WHERE MyInt < 10000
)
SELECT * FROM Test
WHERE MyNull IS NULL
OPTION (MAXRECURSION 32767);
WITH Test (MyInt, MyBit)
AS
(
SELECT 1 AS MyInt, 0 AS MyBit
UNION ALL
SELECT
MyInt + 1,
CASE
WHEN MyInt % 2 = 0 THEN 0
ELSE 1
END
FROM Test
WHERE MyInt < 10000
)
SELECT * FROM Test
WHERE MyBit = 0
OPTION (MAXRECURSION 32767);
I know my example is contrived but it uses equal numbers of nulls and 0's. Even if you re-write the CTE for each example into table variables you should see the same performance.
As with most SQL performance questions, the answer will be driven by your table schema, not by your SQL text. As it is right now, both version require a full table scan, so they are equaly bad.
A non clustered index on ProcessedDate may help the first version, but since you ask for *
in the projection, the index tipping point may kick in and still hit a full scan.
On the other hand a clustered index with the leftmost key Processed will always work, and such a clustered index structure is common in queue processing (and having a 'ProcessedDate' column is a clear give away that your table is used as a queue).
But then a leftmost column on ProcessedDate for the clustered index would also work just like the Processed bit.
The lesson to take home: is not the query, is the schema. Always.
You should not have both.
Pick the one that matches your domain model best.
With the right index, any performance difference will negligible (assuming that the relative distribution of 'processed' to 'not processed' means the optimizer will pick an index in the first place).
One advantage of the ProcessedDate is null
predicate is that you don't risk getting bitten by (explicit or implicit) parametrization.
If you use where Processed = 0
, SQL server might internally parametrize this to Processed = @p
, and the generate the query plan based on the assumption that @p = 0. However, if you somewhere do the same query but where Processed = 1
, you might get a plan for @p = 1, and since the data is likely to be skewed, it's likely that the plans won't be the same.
精彩评论