开发者

Number of rows with specified number of null columns

How to get information how many rows in a table have particular number of nulls in it?

I want to get something like this:

Number of nulls | Number of rows
        0       |      10
        1       |       4
        2       |      11
开发者_Go百科

Motivation:

I need this for data mining purposes. If for example I have an observation that has in almost all columns null value then I need to get rid of this observation, but there can be also a situation where many observations have small value of nulls which is acceptable.


With just SQL, you'll have to resort to unpleasant code, something like:

SELECT CASE WHEN column1 IS NULL THEN 1 ELSE 0 END
       + CASE WHEN column2 IS NULL THEN 1 ELSE 0 END
       + ... AS num_nulls,
       COUNT(*) as num_rows
FROM table 
GROUP BY num_nulls;

Also note that not all SQL dialects support referencing a calculated column by alas in the GROUP BY clause, so you might end up with a much uglier query. Needless to say, you'll also have to make a different query for each table. You could of course generate the query using some INFORMATION_SCHEMA voodoo...


Something like:

select MyCol, Count(MyCountCol) from 
(select 0 + 
      case  when Col1 is null then 1 else 0 end 
      + case  when Col2 is null then 1 else 0 end 
      -- + whatever other col names are in your table  
      as MyCountCol

from MyTable)
group by MyCol


For SQL Server 2008 you can do

DECLARE @T TABLE
(
pk INT PRIMARY KEY, 
c1 INT,
c2 INT,
c3 VARCHAR(10)
)

INSERT INTO @T
SELECT 1,1,1,'foo'
UNION ALL
SELECT 2,1,NULL,'bar'
UNION ALL
SELECT 3,NULL,NULL,NULL
UNION ALL
SELECT 4,NULL,NULL,NULL


SELECT Num AS [Number of Nulls],
       COUNT(*) AS [Number of rows]
FROM @T
CROSS APPLY (SELECT COUNT(*) - COUNT(c) FROM (VALUES(cast(c1 as SQL_VARIANT)),
                                                    (c2),
                                                    (c3)) T (c)) CA(Num)
GROUP BY Num
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜