Number of rows with specified number of null columns
How to get information how many rows in a table have particular number of nulls in it?
I want to get something like this:Number of nulls | Number of rows
0 | 10
1 | 4
2 | 11
开发者_Go百科
Motivation:
I need this for data mining purposes. If for example I have an observation that has in almost all columns null value then I need to get rid of this observation, but there can be also a situation where many observations have small value of nulls which is acceptable.With just SQL, you'll have to resort to unpleasant code, something like:
SELECT CASE WHEN column1 IS NULL THEN 1 ELSE 0 END
+ CASE WHEN column2 IS NULL THEN 1 ELSE 0 END
+ ... AS num_nulls,
COUNT(*) as num_rows
FROM table
GROUP BY num_nulls;
Also note that not all SQL dialects support referencing a calculated column by alas in the GROUP BY
clause, so you might end up with a much uglier query. Needless to say, you'll also have to make a different query for each table. You could of course generate the query using some INFORMATION_SCHEMA
voodoo...
Something like:
select MyCol, Count(MyCountCol) from
(select 0 +
case when Col1 is null then 1 else 0 end
+ case when Col2 is null then 1 else 0 end
-- + whatever other col names are in your table
as MyCountCol
from MyTable)
group by MyCol
For SQL Server 2008 you can do
DECLARE @T TABLE
(
pk INT PRIMARY KEY,
c1 INT,
c2 INT,
c3 VARCHAR(10)
)
INSERT INTO @T
SELECT 1,1,1,'foo'
UNION ALL
SELECT 2,1,NULL,'bar'
UNION ALL
SELECT 3,NULL,NULL,NULL
UNION ALL
SELECT 4,NULL,NULL,NULL
SELECT Num AS [Number of Nulls],
COUNT(*) AS [Number of rows]
FROM @T
CROSS APPLY (SELECT COUNT(*) - COUNT(c) FROM (VALUES(cast(c1 as SQL_VARIANT)),
(c2),
(c3)) T (c)) CA(Num)
GROUP BY Num
精彩评论