A way to improve performance of the following query (update table)
This question is related to a question I published a while ago, and can be found here: Update values of database with values that are already in DB .
I've the following situation: A table that stores data from different sensors (I've a total of 8 sensors). Each row of the table has the following structure:
SensorID --- TimestampMS --- RawData --- Data
So, for example, for a temperature sensor called TEMPSensor1 I have the following:
TEMPSensor1 --- 1000 -开发者_运维知识库-- 200 --- 2
TEMPSensor1 --- 2000 --- 220 --- 2.2
And so on, for each sensor (in total I've 8). I've some problems reading the data, and there are rows which data "is not correct". Precisely when the rawdata field is 65535, I should update that particular row. And what I would like to do is put the next value (in time) to that "corrupted data". So, if we have this:
TEMPSensor1 --- 1000 --- 200 --- 2
TEMPSensor1 --- 2000 --- 220 --- 2.2 TEMPSensor1 --- 3000 --- 65535 --- 655.35 TEMPSensor1 --- 4000 --- 240 --- 2.4
After doing the Update, the content of the table should be changed to:
TEMPSensor1 --- 1000 --- 200 --- 2
TEMPSensor1 --- 2000 --- 220 --- 2.2 TEMPSensor1 --- 3000 --- 240 --- 2.4 TEMPSensor1 --- 4000 --- 240 --- 2.4
I've ended up doing the following:
UPDATE externalsensor es1
INNER JOIN externalsensor es2 ON es1.sensorid = es2.sensorid AND (es2.timestampms - es1.timestampms) > 60000 AND (es2.timestampms - es1.timestampms) < 120000 AND es1.rawdata <> 65535
SET es1.rawdata = es2.rawdata, es1.data = es2.data
WHERE es1.rawdata = 65535
Because I know that between two reads from a sensor there are between 60000 and 120000 ms. However, if I have two following "corrupted" readings, that won't work. Can anyone suggest a way to do this more efficiently, with no use of subquery selects, but JOINS? My idea would be to have a JOIN that gives you all the possible values for that sensor after its timestampms, and just get the first one, but I don't know how I can limit that JOIN result.
Appreciate.
Here's a solution without correlated subqueries, but with a triangular join (not sure which is worse):
UPDATE externalsensor bad
INNER JOIN (
SELECT
es1.SensorID,
es1.TimestampMS,
MIN(es2.TimestampMS) AS NextGoodTimestamp
FROM externalsensor es1
INNER JOIN externalsensor es2
ON es1.SensorID = es2.SensorID AND
es1.TimestampMS < es2.TimestampMS
WHERE es1.RawData = 65535
AND es2.RawData <> 65535
GROUP BY
es1.SensorID,
es1.TimestampMS
) link ON bad.SensorID = link.SensorID AND
bad.TimestampMS = link.TimestampMS
INNER JOIN externalsensor good
ON link.SensorID = good.SensorID AND
link.NextGoodTimestamp = good.TimestampMS
SET
bad.RawData = good.RawData,
bad.Data = good.Data
It is assumed that the timestamps are unique within a single sensor group.
A completely different approach, using a procedure that runs through the whole table (in descending time order for every sensor):
DELIMITER $$
CREATE PROCEDURE updateMyTable()
BEGIN
SET @dummy := -9999 ;
SET @dummy2 := -9999 ;
SET @sensor := -999 ;
UPDATE myTable m
JOIN
( SELECT n.SensorID
, n.TimestampMS
, @d := (n.RawData = 65535) AND (@sensor = n.SensorID) AS problem
, @dummy := IF(@d, @dummy, n.RawData) as goodRawData
, @dummy2 := IF(@d, @dummy2, n.Data) as goodData
, @sensor := n.SensorID AS previous
FROM myTable n
ORDER BY n.SensorID
, n.TimeStampMS DESC
) AS upd
ON m.SensorID = upd.SensorID
AND m.TimeStampMS = upd.TimeStampMS
SET m.RawData = upd.goodRawData
, m.Data = upd.goodData
WHERE upd.problem
;
END$$
DELIMITER ;
Since you don't want to use Dems solution from the previous question, here's a "solution" with JOIN's:
UPDATE myTable m
JOIN myTable n
ON m.SensorID = n.SensorID
AND n.RawData <> 65535
AND m.TimestampMS < n.TimestampMS
JOIN myTable q
ON n.SensorID = q.SensorID
AND q.RawData <> 65535
AND n.TimestampMS <= q.TimestampMS
SET
m.RawData = n.RawData,
m.Data = n.Data
WHERE
m.RawData = 65535
;
EDIT
My query above is wrong, dead wrong. It appears to be working in my test db but the logic is flawed. I'll explain below.
Why the above query works fine but is dead wrong:
First, why it's wrong.
Because it will not return one row for every (sensorID, bad timestamp) combination but many rows. If m
(m.TimestampMS
) is the bad timestamp we want to find, it will return all combinations of that bad timetsamp and later good timestamps n
and q
with n.TimestampMS <= q.TimestampMS
. It would be a correct query if it found the MINIMUM of these n
timestamps.
Now, how come it actually works all right in my test db?
I think it's because MySQL, when it comes to use the SET ...
and has a lot of options (rows) it just uses first option. But lucky me, I added the test rows in increasing timestamp order so they were saved in that order in the db, and (again) lucky me, this is how the query plan is scheduled (I presume).
Even this query works in my test db:
UPDATE myTable m
JOIN myTable n
ON m.SensorID = n.SensorID
AND n.RawData <> 65535
AND m.TimestampMS < n.TimestampMS
SET
m.RawData = n.RawData,
m.Data = n.Data
WHERE
m.RawData = 65535
;
while being flawed for the same reasons.
精彩评论