Choose a preferred row out of partially duplicate data
I have the following query:
select
mb.id as meter_id
,ds.mydate as mydate
,mb.name as metergroup
,sum(ms.stand) as measured_cum_value
,me.name as energy_medium
,e.name as unit_of_measure
,min(ms.source) as source
,count(*) as debugcount
FROM datumselect ds <<-- mem table with dates to query.
INNER JOIN metergroup mb ON (mb.building_id = 1)
INNER JOIN meter m ON (m.metergroup_id = mb.id) <<-- meters are grouped
INNER JOIN medium me ON (me.id = mb.medium_id) <<-- lookuptables for normalization
INNER JOIN unit e ON (e.id = mb.unit_id) <<-- ditto
INNER JOIN meterstand ms ON (ms.meter_id = m.id AND ms.mydate = ds.mydate)
group by ds.mydate, mb.id, ms.source <<-- this is prob. broken.
having source = MIN(ms.source) <<-- this `having` does not work !
ORDER BY mb.id, ds.mydate
I'm selecting from the following table:
CREATE TABLE meterstand(
id INT(11) UNSIGNED NOT NULL AUTO_INCREMENT,
meter_id INT(11) UNSIGNED NOT NULL,
mydate DATETIME NOT NULL,
stand DECIMAL(16, 5) NOT NULL,
source ENUM('calculated', 'read', 'manual') NOT NULL DEFAULT 'read',
PRIMARY KEY (id),
INDEX FK_meterstand_meter_id (meter_id),
UNIQUE INDEX UK_meterstand (datum, meter_id, bron),
CONSTRAINT FK_meterstand_meter_id FOREIGN KEY (meter_id)
REFERENCES vaanstermeters.meter (id) ON DELETE RESTRICT ON UPDATE CASCADE
)
ENGINE = INNODB
开发者_如何学运维AUTO_INCREMENT = 181
AVG_ROW_LENGTH = 105
CHARACTER SET latin1
COLLATE latin1_swedish_ci;
A simpler query given the below data would be:
SELECT
meter_id
, mydate
, sum(stand)
, count(*) as debugcount
FROM meterstand
WHERE mydate IN (list_of_dates_im_interested_in)
GROUP BY meter_id, my_date
HAVING the_best(source)
Given the current data debugcount
should be 1 always, but if there are multiple meter's in a group in the above query debugcount
should be the number of meters in the group.
I can choose between values from different source, I have:
-manual
source, this is golden;
- read
sources from a datasource, a meter in a building somewhere;
- calculated
data, interpolated to make up for missing data.
A single datapoint having the same meter_id+mydate can have multiple sources.
The query should favourmanual
sources over read
and only select calculated
data if no other data is available.
Here is a sample of the data in meterstand
:
id meter_id mydate stand source
------------------------------------------------------
179 6 1-12-2010 94,75886 calculated
180 7 1-12-2010 256,02618 calculated
164 7 1-1-2011 285,41800 manual <<--- Query should only consider this row.
183 7 1-1-2011 0,00000 read <<-- and forget about this one
What's the proper query syntax to use to select the best data points?
From the looks of it, MySQL defines sort ordering for enums as being the order in which they were listed in the definition. Given that you've defined the order as the reverse of which they are to appear, I believe the following will work as expected (no instance to test against, though):
SELECT *
FROM meterstand as a
JOIN (SELECT meter_id, mydate, MAX(source) as source
FROM meterstand
GROUP BY meter_id, mydate) as b
ON b.meter_id = a.meter_id
AND b.mydate = a.mydate
AND b.source = a.source
(Assuming that [meter_id, mydate, source] is unique, of course).
It does look like there was a bug at one point that was causing the enums to be sorted by their string values (which won't help you at all, given the strings).
If it sill exists (or you want a little more control over use-order), you may want to define a table:
Meter_Reading_Type
========================
Id Description Priority
1 Manual 10
2 Calculated 30
3 Read 20
Then reference it as a fk and sort by (min) priority.
精彩评论