SQL: selecting rows where column value changed from previous row
Let's say I have this (MySQL) database, sorted by increasing timestamp:
Timestamp System StatusA StatusB
2011-01-01 A Ok Ok
2011-01-02 B Ok Ok
2011-01-03 A Fail Fail
2011-01-04 B Ok Fail
2011-01-05 A Fail Ok
2011-01-06 A Ok Ok
2011-01-07 B Fail Fail
How do I select the rows where StatusA changed from the previous row for that system? StatusB doesn't matter (I show it in this question only to illustrate that there may be many c开发者_如何转开发onsecutive rows for each system where StatusA doesn't change). In the example above, the query should return the rows 2011-01-03 (StatusA changed between 2011-01-01 and 2011-01-03 for SystemA), 2011-01-06, 2011-01-07.
The query should execute quickly with the table having tens of thousands of records.
Thanks
SELECT a.*
FROM tableX AS a
WHERE a.StatusA <>
( SELECT b.StatusA
FROM tableX AS b
WHERE a.System = b.System
AND a.Timestamp > b.Timestamp
ORDER BY b.Timestamp DESC
LIMIT 1
)
But you can try this as well (with an index on (System,Timestamp)
:
SELECT System, Timestamp, StatusA, StatusB
FROM
( SELECT (@statusPre <> statusA AND @systemPre=System) AS statusChanged
, System, Timestamp, StatusA, StatusB
, @statusPre := StatusA
, @systemPre := System
FROM tableX
, (SELECT @statusPre:=NULL, @systemPre:=NULL) AS d
ORDER BY System
, Timestamp
) AS good
WHERE statusChanged ;
Use rownum
I've got 0.05 seconds on 20000 rows
select a1.*
from (select rownum R_NUM, TIMESTAMP, System, StatusA from TableX) a1
join (select rownum R_NUM, TIMESTAMP, SYSTEM, STATUSA from TABLEX) a2
on a1.R_NUM = a2.R_NUM+1
where a1.system = a2.system
and a1.StatusA != a2.StatusA
select a.Timestamp, a.System, a.StatusA, a.StatusB
from tableX as a
cross join tableX as b
where a.System = b.System
and a.Timestamp > b.Timestamp
and not exists (select *
from tableX as c
where a.System = c.System
and a.Timestamp > c.Timestamp
and c.Timestamp > b.Timestamp
)
and a.StatusA <> b.StatusA;
Update addressing a comment: Why not use an inner join instead of a cross join?
The question asks for a MySQL solution. According to the documentation:
In MySQL, CROSS JOIN is a syntactic equivalent to INNER JOIN (they can replace each other). In standard SQL, they are not equivalent. INNER JOIN is used with an ON clause, CROSS JOIN is used otherwise.
This means that either of these joins would work.
The conditional_expr used with ON is any conditional expression of the form that can be used in a WHERE clause. Generally, you should use the ON clause for conditions that specify how to join tables, and the WHERE clause to restrict which rows you want in the result set.
The condition a.System = b.System
probably falls under the 'how to join tables' category so using an INNER JOIN would be nicer in this case.
Since both produce the same results, the difference might be in performance. To say which will be faster I would need to know how are the joins implemented internally - whether they use indexes or hashing to do the joining.
Here's a slightly shorter version with similar logic. I've tested this so often I'm sure it's efficient; primarily because it eliminates the correlated subquery (WHERE NOT EXISIS).
"c" is in there to make sure that b is directly below a - it says c (between them) can't be found (via the NULL test).
SELECT a.Timestamp, a.System, a.StatusA, a.StatusB
FROM tableX AS a
JOIN tableX AS b
ON a.System = b.System
AND a.Timestamp > b.Timestamp
LEFT JOIN tableX AS c
ON a.System = b.System
AND a.Timestamp > c.Timestamp
AND b.Timestamp < c.Timestamp
WHERE c.System IS NULL
AND a.StatusA <> b.StatusA;
Egor's answer worked for me in MSSQL with one small change. Had to replace the ROWNUM statements with:
select row_number () over (order by TIMESTAMP) as R_NUM, ...
SELECT a.*
FROM (select row_number() over (partition by System order by Timestamp asc) as aRow, Timestamp, System, StatusA, StatusB from tableX) as a
left join (select row_number() over (partition by System order by Timestamp asc) as bRow, Timestamp, System, StatusA, StatusB from tableX) as b on a.aRow = b.bRow + 1 and a.System = b.System
where (a.StatusA != b.StatusA or b.StatusA is null)
It will return first row and rows where value is different.
This is exactly what windowing functions as built for. LAG will give the exact answer:
create table t1 (ts date, sys char(1),stata varchar(10),statb varchar(10));
insert into t1 values
('2011-01-01','A',' Ok','Ok'),
('2011-01-02','B',' Ok','Ok'),
('2011-01-03','A','Fail','Fail'),
('2011-01-04','B',' Ok','Fail'),
('2011-01-05','A','Fail','Ok'),
('2011-01-06','A',' Ok','Ok'),
('2011-01-07','B','Fail','Fail');
select * from (
select ts,sys,stata,lag(stata) over(partition by sys order by ts asc) as prev from t1
) as subsel where stata!=prev
Select * from table
Qualify lag(StatusA) is distinct from StatusA over (Partition by System order by Timestamp)
;
Since mysql don't have Qualify
and is distinct from
functions:
SELECT
Timestamp, System, StatusA, StatusB
FROM (
SELECT
*, lag(StatusA) OVER (Partition by System order by Timestamp) as prev
FROM
table
) a
WHERE
a.prev != StatusA AND a.prev is null
;
精彩评论