开发者

A way to improve performance of the following query (update table)

This question is related to a question I published a while ago, and can be found here: Update values of database with values that are already in DB .

I've the following situation: A table that stores data from different sensors (I've a total of 8 sensors). Each row of the table has the following structure:

SensorID --- TimestampMS --- RawData --- Data

So, for example, for a temperature sensor called TEMPSensor1 I have the following:

TEMPSensor1 --- 1000 -开发者_运维知识库-- 200 --- 2

TEMPSensor1 --- 2000 --- 220 --- 2.2

And so on, for each sensor (in total I've 8). I've some problems reading the data, and there are rows which data "is not correct". Precisely when the rawdata field is 65535, I should update that particular row. And what I would like to do is put the next value (in time) to that "corrupted data". So, if we have this:

TEMPSensor1 --- 1000 --- 200 --- 2

TEMPSensor1 --- 2000 --- 220 --- 2.2

TEMPSensor1 --- 3000 --- 65535 --- 655.35

TEMPSensor1 --- 4000 --- 240 --- 2.4

After doing the Update, the content of the table should be changed to:

TEMPSensor1 --- 1000 --- 200 --- 2

TEMPSensor1 --- 2000 --- 220 --- 2.2

TEMPSensor1 --- 3000 --- 240 --- 2.4

TEMPSensor1 --- 4000 --- 240 --- 2.4

I've ended up doing the following:

UPDATE externalsensor es1
INNER JOIN externalsensor es2 ON es1.sensorid = es2.sensorid AND (es2.timestampms - es1.timestampms) > 60000 AND (es2.timestampms - es1.timestampms) < 120000  AND es1.rawdata <> 65535
SET es1.rawdata = es2.rawdata, es1.data = es2.data
WHERE es1.rawdata = 65535

Because I know that between two reads from a sensor there are between 60000 and 120000 ms. However, if I have two following "corrupted" readings, that won't work. Can anyone suggest a way to do this more efficiently, with no use of subquery selects, but JOINS? My idea would be to have a JOIN that gives you all the possible values for that sensor after its timestampms, and just get the first one, but I don't know how I can limit that JOIN result.

Appreciate.


Here's a solution without correlated subqueries, but with a triangular join (not sure which is worse):

UPDATE externalsensor bad

  INNER JOIN (
    SELECT
      es1.SensorID,
      es1.TimestampMS,
      MIN(es2.TimestampMS) AS NextGoodTimestamp
    FROM externalsensor es1
      INNER JOIN externalsensor es2
        ON es1.SensorID = es2.SensorID AND
           es1.TimestampMS < es2.TimestampMS
    WHERE es1.RawData = 65535
      AND es2.RawData <> 65535
    GROUP BY
      es1.SensorID,
      es1.TimestampMS
  ) link ON bad.SensorID = link.SensorID AND
            bad.TimestampMS = link.TimestampMS

  INNER JOIN externalsensor good
    ON link.SensorID = good.SensorID AND
       link.NextGoodTimestamp = good.TimestampMS

SET
  bad.RawData = good.RawData,
  bad.Data = good.Data

It is assumed that the timestamps are unique within a single sensor group.


A completely different approach, using a procedure that runs through the whole table (in descending time order for every sensor):

DELIMITER $$
CREATE PROCEDURE updateMyTable()
BEGIN
  SET @dummy := -9999 ;
  SET @dummy2 := -9999 ;
  SET @sensor := -999 ;

  UPDATE myTable m
    JOIN 
    ( SELECT n.SensorID
           , n.TimestampMS
           , @d := (n.RawData = 65535) AND (@sensor = n.SensorID) AS problem
           , @dummy  := IF(@d, @dummy, n.RawData) as goodRawData
           , @dummy2 := IF(@d, @dummy2, n.Data) as goodData
           , @sensor := n.SensorID AS previous
      FROM myTable n
      ORDER BY n.SensorID
             , n.TimeStampMS DESC
    ) AS upd
    ON m.SensorID = upd.SensorID
      AND m.TimeStampMS = upd.TimeStampMS
  SET m.RawData = upd.goodRawData
    , m.Data = upd.goodData
  WHERE upd.problem
  ;
END$$
DELIMITER ;


Since you don't want to use Dems solution from the previous question, here's a "solution" with JOIN's:

UPDATE myTable m
  JOIN myTable n
    ON m.SensorID = n.SensorID
      AND n.RawData <> 65535
      AND m.TimestampMS < n.TimestampMS
  JOIN myTable q
    ON n.SensorID = q.SensorID
      AND q.RawData <> 65535
      AND n.TimestampMS <= q.TimestampMS
SET
  m.RawData = n.RawData,
  m.Data = n.Data
WHERE
   m.RawData = 65535
;

EDIT

My query above is wrong, dead wrong. It appears to be working in my test db but the logic is flawed. I'll explain below.

Why the above query works fine but is dead wrong:

First, why it's wrong.

Because it will not return one row for every (sensorID, bad timestamp) combination but many rows. If m (m.TimestampMS) is the bad timestamp we want to find, it will return all combinations of that bad timetsamp and later good timestamps n and q with n.TimestampMS <= q.TimestampMS. It would be a correct query if it found the MINIMUM of these n timestamps.

Now, how come it actually works all right in my test db?

I think it's because MySQL, when it comes to use the SET ... and has a lot of options (rows) it just uses first option. But lucky me, I added the test rows in increasing timestamp order so they were saved in that order in the db, and (again) lucky me, this is how the query plan is scheduled (I presume).

Even this query works in my test db:

UPDATE myTable m
  JOIN myTable n
    ON m.SensorID = n.SensorID
      AND n.RawData <> 65535
      AND m.TimestampMS < n.TimestampMS
SET
  m.RawData = n.RawData,
  m.Data = n.Data
WHERE
   m.RawData = 65535
;

while being flawed for the same reasons.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜