Effective way to separate a group into individual records
I'm grouping some records by their proximity of time. What I do right now (timestamps in unixtime),
First off I do a sub select to grab records that are of interest of me,
(SELECT timestamp AS target_time FROM table WHERE something = cool) AS subselect
Then I want to look at the records that are close in time to those,
SELECT id FROM table, subselect WHERE ABS(target_time - timestamp) < 1800
But here is where I hit my problem. I want to only want the records where the time diffrance between the records around the target_time
is > 20 mins. So to do this, I group by the target_time
and add a HAVING
section.
SELECT id FROM table, first WHERE ABS(target_time - timestamp) < 3600
GROUP BY target_time HAVING MAX(timestamp) - MIN(timestamp) > 1200
This is great, and all the records I don't like are gone, but now I only have the first id
of the group, when I really want all of the ids
. I can use GROUP_CONCAT
but that gives me a be mess I can't do anymore queries on. What I really would like it to get all of the ids
returned from all of these groups that are created. Do I need another SELECT
statement?开发者_运维问答 Or is there just a better way to structure what I got?
Thank you,
A SQL nub.
See if I have your problem correct:
For a given row in a table, you want to know the set of rows for similar records if the range of timestamps for those records is greater than 20 minutes. You want to to this for all ids in the table.
If you simply want a list of ids which fulfil this criteria, it is fairly straightforward:
given a table like:
create table foo (id bigint(4), section VARCHAR(2), modification datetime);
you can do:
select id, foo.section, min_max.min_modification, min_max.max_modification, abs(min_max.min_modification - min_max.max_modification) as diff
from foo,
(select section, max(modification) max_modification, min(modification) min_modification from foo as inner_foo group by section) as min_max
where foo.section = min_max.section
and abs(min_max.min_modification - min_max.max_modification) > 1800;
You're doing a subselect based on the 'similar rows' criteria (in this case the column section) to get the minimum and maximum timestamps for that section. This min and max applies to all ids in that section. Hence, for section 'A', you will have a list of ids, same for section 'B'.
My assumption is you want an output that looks like:
id1, timestamp1, fieldA, fieldB
id1, timestamp2, fieldA, fieldB
id2, timestamp3, fieldA, fieldB
id2, timestamp4, fieldA, fieldB
id3, timestamp5, fieldA, fieldB
id3, timestamp6, fieldA, fieldB
but the timestamp for these records is BETWEEN 1200 and 1800 seconds of a "target_time" where something = cool?
SELECT data.id, data.timestamp, data.fieldA, data.fieldB, ..., data.fieldX
FROM events
JOIN data
WHERE events.something = cool_event -- Gives the 'target_time' of cool_event
AND ABS(event.timestamp - data.timestamp) BETWEEN 1200 and 1800 -- gives data records 'near' target time, but at least 20 minutes away.
IF the 'data' and 'events' table are the SAME table, then just use table alias names, but you can join a table to itself, aka 'SELF-JOIN'.
SELECT data.id, data.timestamp, data.fieldA, data.fieldB, ..., data.fieldX
FROM events AS target, events AS data
WHERE target.something = cool_event -- gives the 'target_time' of cool_event
AND ABS(target.timestamp - data.timestamp) BETWEEN 1200 and 1800 -- gives data records 'near' target time, but at least 20 minutes away.
This sounds about right, and is without any group-by or aggregates needed.
You can order the resulting data if necessary.
-- J Jorgenson --
精彩评论