开发者

Postgres SQL select a range of records spaced out by a given interval

I am trying to determine if it is possible, using only sql for postgres, to select a range of time ordered re开发者_运维百科cords at a given interval.

Lets say I have 60 records, one record for each minute in a given hour. I want to select records at 5 minute intervals for that hour. The resulting rows should be 12 records each one 5 minutes apart.

This is currently accomplished by selecting the full range of records and then looping thru the results and pulling out the records at the given interval. I am trying to see if I can do this purly in sql as our db is large and we may be dealing with tens of thousands of records.

Any thoughts?


Yes you can. Its really easy once you get the hang of it. I think its one of jewels of SQL and its especially easy in PostgreSQL because of its excellent temporal support. Often, complex functions can turn into very simple queries in SQL that can scale and be indexed properly.

This uses generate_series to draw up sample time stamps that are spaced 1 minute apart. The outer query then extracts the minute and uses modulo to find the values that are 5 minutes apart.

select
    ts,
    extract(minute from ts)::integer as minute

    from
    ( -- generate some time stamps - one minute apart
        select
            current_time + (n || ' minute')::interval  as ts
        from generate_series(1, 30) as n
    ) as timestamps
    -- extract the minute check if its on a 5 minute interval
    where extract(minute from ts)::integer % 5 = 0
    -- only pick this hour 
    and extract(hour from ts) = extract(hour from current_time)
;
         ts         | minute 
--------------------+--------
 19:40:53.508836-07 |     40
 19:45:53.508836-07 |     45
 19:50:53.508836-07 |     50
 19:55:53.508836-07 |     55

Notice how you could add an computed index on the where clause (where the value of the expression would make up the index) could lead to major speed improvements. Maybe not very selective in this case, but good to be aware of.

I wrote a reservation system once in PostgreSQL (which had lots of temporal logic where date intervals could not overlap) and never had to resort to iterative methods.

http://www.amazon.com/SQL-Design-Patterns-Programming-Focus/dp/0977671542 is an excellent book that goes has lots of interval examples. Hard to find in book stores now but well worth it.


Extract the minutes, convert to int4, and see, if the remainder from dividing by 5 is 0:

select * 
  from TABLE 
  where int4 (date_part ('minute', COLUMN)) % 5 = 0; 


  • If the intervals are not time based, and you just want every 5th row; or
  • If the times are regular and you always have one record per minute

The below gives you one record per every 5

select *
from
(
  select *, row_number() over (order by timecolumn) as rown
  from tbl
) X
where mod(rown, 5) = 1

If your time records are not regular, then you need to generate a time series (given in another answer) and left join that into your table, group by the time column (from the series) and pick the MAX time from your table that is less than the time column.

Pseudo

select thetimeinterval, max(timecolumn)
from ( < the time series subquery > ) X
left join tbl on tbl.timecolumn <= thetimeinterval
group by thetimeinterval

And further join it back to the table for the full record (assuming unique times)

select t.* from
tbl inner join
(
    select thetimeinterval, max(timecolumn) timecolumn
    from ( < the time series subquery > ) X
    left join tbl on tbl.timecolumn <= thetimeinterval
    group by thetimeinterval
) y on tbl.timecolumn = y.timecolumn


How about this:

select min(ts), extract(minute from ts)::integer / 5 
   as bucket group by bucket order by bucket; 

This has the advantage of doing the right thing if you have two readings for the same minute, or your readings skip a minute. Instead of using min even better would be to use one of the the first() aggregate functions-- code for which you can find here:

http://wiki.postgresql.org/wiki/First_%28aggregate%29


This assumes that your five minute intervals are "on the fives", so to speak. That is, that you want 07:00, 07:05, 07:10, not 07:02, 07:07, 07:12. It also assumes you don't have two rows within the same minute, which might not be a safe assumption.

select your_timestamp
from your_table
where cast(extract(minute from your_timestamp) as integer) in (0,5);

If you might have two rows with timestamps within the same minute, like

2011-01-01 07:00:02
2011-01-01 07:00:59

then this version is safer.

select min(your_timestamp)
from your_table
group by (cast(extract(minute from your_timestamp) as integer) / 5)

Wrap either of those in a view, and you can join it to your base table.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜