开发者

Filtering bad data in SQL Server 2005

Using SQL Server 2005

I have a table with the following columns

id name date value

I would like to select all rows from the table where there are not four consecutive zeroes by date. How would I do that? Below is an example of what I mean.

id      name     date         value
1       a        1/1/2010     5
2       a        1/2/2010     3
3       a        1/3/2010     5
4       a        1/4/2010     0
5       a        1/7/2010     0
6       a        1/8/2010     0
7       a        1/9/2010     2
8       a        1/10/2010    3
9       a        1/11/2010    0
10      a        1/15/2010    0
11      a        1/16/2010    0
12 开发者_开发百科     a        1/17/2010    0
13      a        1/20/2010    4
14      a        1/21/2010    4

I would like the result of the query to include all rows except id 9-12.


This is assuming you ordered the rows by ID but you can simply change the ORDER BY id to something else and it should still work.

Using the T-SQL CTE found on this Kodyaz Development Resources site I was able to create the below code. I have it working so it deletes the rows where there are two consecutive zeroes, not 4, as I tested it on my code and just changed the table/row names.

WITH CTE as (
  SELECT
    RN = ROW_NUMBER() OVER (ORDER BY id),
    *
  FROM tablename
)
SELECT
  [Current Row].*
FROM CTE [Current Row]
LEFT JOIN CTE [Previous Row] ON
  [Previous Row].RN = [Current Row].RN - 1
LEFT JOIN CTE [Next Row] ON
  [Next Row].RN = [Current Row].RN + 1
WHERE
  not([Current Row].value = 0 AND [Next Row].value = 0) AND  
     // this deletes the row where value is zero and the next rows value is zero
  not([Previous Row].value = 0 AND [Current Row].value = 0) 
     // this deletes the row where value is zero and the previous rows value is zero

All you have to do to make it work for your case is putting each possible combination within the WHERE statement. For example dealing with this row and the 3 next rows equaling 0 OR this row the previous and the 2 next rows.


You don't mention how the name is involved, so I'm assuming that you want this done by name. I'm going to further assume that when you talk about "consecutive" that you mean in date order, not in id order. Finally, I'm also going to assume that you would also exclude 5 zeroes in a row, 6 zeroes in a row, etc.

There may be a simpler way, but this should work:

;WITH Transitions_To_CTE AS
(
    SELECT
        T1.id,
        T1.name,
        T1.date,
        T1.value
    FROM
        My_Table T1
    LEFT OUTER JOIN My_Table T2 ON
        T2.name = T1.name AND
        T2.date < T1.date AND
        T2.value <> 0
    LEFT OUTER JOIN My_Table T3 ON
        T3.name = T1.name AND
        T3.date > COALESCE(T2.date, '1900-01-01') AND
        T3.date < T1.date
    WHERE
        T1.value = 0 AND
        T3.id IS NULL
),
Transitions_From_CTE AS
(
    SELECT
        T1.id,
        T1.name,
        T1.date,
        T1.value
    FROM
        My_Table T1
    LEFT OUTER JOIN My_Table T2 ON
        T2.name = T1.name AND
        T2.date > T1.date AND
        T2.value <> 0
    LEFT OUTER JOIN My_Table T3 ON
        T3.name = T1.name AND
        T3.date < COALESCE(T2.date, '9999-12-31') AND
        T3.date > T1.date
    WHERE
        T1.value = 0 AND
        T3.id IS NULL
),
Range_Exclusions AS
(
    SELECT
        S.name,
        S.date AS start_date,
        E.date AS end_date
    FROM
        Transitions_To_CTE S
    INNER JOIN Transitions_From_CTE E ON
        E.name = S.name AND
        E.date > S.date
    LEFT OUTER JOIN Transitions_From_CTE E2 ON
        E2.name = S.name AND
        E2.date > S.date AND
        E2.date < E.date
    WHERE
        E2.id IS NULL AND
        (SELECT COUNT(*) FROM dbo.My_Table T WHERE T.name = S.name AND T.date BETWEEN S.date AND E.date) >= 4
)
SELECT
    T.id,
    T.name,
    T.date,
    T.value
FROM
    dbo.My_Table T
WHERE
    NOT EXISTS (SELECT * FROM Range_Exclusions RE WHERE RE.name = T.name AND T.date BETWEEN RE.start_date AND RE.end_date)


Here's my attempt, use a recursive cte to work out the number of consecutive zeros, then create a sequence of IDs using the level > 4 and then simply do a not in clause on the ids.

with trend --work out number of consecutive zeros using level
as
(Select 1 as level, id, value, id as startid
    from IdsAndValues
    Union All
    Select [Level]+1, P.ID, p.value, t.startid
    From IdsAndValues as p
        Inner Join trend as t on p.id = t.id+1
    Where t.value =0 and p.value=0
)
,IDs --create sequence of ids using startid and  id, this allows us to do the not in
as
(   
    Select  startid as ExcludeID ,id 
    from trend as t--
    Where level>=4
    Union All
    Select ExcludeID +1, id
    From ids 
    where ExcludeID <id 
)

Select *
from IdsAndValues
Where id Not in
    (Select ExcludeID from IDs)
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜