开发者

Query database in weekly interval

I have a database with a created_at column containing the datetime in Y-m-d H:i:s format.

The latest datetime entry is 2011-09-28 00:10:02.

I need the query to be relative to the latest datetime entry.

  1. The first value in the query should be the latest datetime entry.
  2. The second value in the query should be the entry closest to 7 days from the first value.
  3. The third value should be the entry closest to 7 days from the second value.
  4. REPEAT #3.

What I mean by "closest to 7 days from":

The following are dates, the interval I desire is a week, in seconds a week is 604800 seconds.

7 days from the first value is equal to 1316578202 (1317183002-604800)

the value closest to 1316578202 (7 days) is... 1316571974

unix timestamp | Y-m-d H:i:s

1317183002 | 2011-09-28 00:10:02 -> appear in query (first value)
1317101233 | 2011-09-27 01:27:13
1317009182 | 2011-09-25 23:53:02
1316916554 | 2011-09-24 22:09:14
1316836656 | 2011-09-23 23:57:36
1316745220 | 2011-09-22 22:33:40
1316659915 | 2011-09-21 22:51:55
1316571974 | 2011-09-20 22:26:14 -> closest to 7 days from 1317183002 (first value)
1316499187 | 2011-09-20 02:13:07
1316064243 | 2011-09-15 01:24:03
1315967707 | 2011-09-13 22:35:07 -> closest to 7 days from 1316571974 (second value)
1315881414 | 2011-09-12 22:36:54
1315794048 | 2011-09-11 22:20:48
1315715786 | 2011-09-11 00:36:26
1315622142 | 2011-09-09 22:35:42

I would really appreciate any help, I have not been able to do this via mysql and no online resources seem to deal with relative date manipulation such as this. I would like the query to be modular enough to be able to change the interval weekly, monthly, or yearly. Thanks in advance!

Answer #1 Reply:

SELECT
UNIX_TIMESTAMP(created_at) 
AS unix_timestamp,
(
  SELECT MIN(UNIX_TIMESTAMP(created_at))
  FROM my_table
  WHERE created_at >=
    (
    SELECT max(created_at) - 7
    FROM my_table
    )
)
AS `random_1`,
(
  SELECT MIN(UNIX_TIMESTAMP(created_at)开发者_如何学JAVA)
  FROM my_table
  WHERE created_at >=
    (
    SELECT MAX(created_at) - 14
    FROM my_table
    )
)
AS `random_2`
FROM my_table
WHERE created_at =
(
SELECT MAX(created_at)
FROM my_table
)

Returns:

unix_timestamp | random_1 | random_2
1317183002 | 1317183002 | 1317183002

Answer #2 Reply:

RESULT SET:

This is the result set for a yearly interval:

id  | created_at          | period_index | period_timestamp
267 | 2010-09-27 22:57:05 | 0            | 1317183002
1   | 2009-12-10 15:08:00 | 1            | 1285554786

I desire this result:

id  | created_at          | period_index | period_timestamp
626 | 2011-09-28 00:10:02 | 0            | 0
267 | 2010-09-27 22:57:05 | 1            | 1317183002

I hope this makes more sense.


It's not exactly what you asked for, but the following example is pretty close....

Example 1:

select
  floor(timestampdiff(SECOND, tbl.time, most_recent.time)/604800) as period_index, 
  unix_timestamp(max(tbl.time)) as period_timestamp
from
  tbl
  , (select max(time) as time from tbl) most_recent
group by period_index

gives results:

+--------------+------------------+
| period_index | period_timestamp |
+--------------+------------------+
|            0 |       1317183002 | 
|            1 |       1316571974 | 
|            2 |       1315967707 | 
+--------------+------------------+

This breaks the dataset into groups based on "periods", where (in this example) each period is 7-days (604800 seconds) long. The period_timestamp that is returned for each period is the 'latest' (most recent) timestamp that falls within that period.

The period boundaries are all computed based on the most recent timestamp in the database, rather than computing each period's start and end time individually based on the timestamp of the period before it. The difference is subtle - your question requests the latter (iterative approach), but I'm hoping that the former (approach I've described here) will suffice for your needs, since SQL doesn't lend itself well to implementing iterative algorithms.


If you really do need to determine each period based on the timestamp in the previous period, then your best bet is going to be an iterative approach -- either using a programming language of your choice (like php), or by building a stored procedure that uses a cursor.


Edit #1

Here's the table structure for the above example.

CREATE TABLE `tbl` (
  `id` int(10) unsigned NOT NULL auto_increment PRIMARY KEY,
  `time` datetime NOT NULL
) 

Edit #2

Ok, first: I've improved the original example query (see revised "Example 1" above). It still works the same way, and gives the same results, but it's cleaner, more efficient, and easier to understand.

Now... the query above is a group-by query, meaning it shows aggregate results for the "period" groups as I described above - not row-by-row results like a "normal" query. With a group-by query, you're limited to using aggregate columns only. Aggregate columns are those columns that are named in the group by clause, or that are computed by an aggregate function like MAX(time)). It is not possible to extract meaningful values for non-aggregate columns (like id) from within the projection of a group-by query.

Unfortunately, mysql doesn't generate an error when you try to do this. Instead, it just picks a value at random from within the grouped rows, and shows that value for the non-aggregate column in the grouped result. This is what's causing the odd behavior the OP reported when trying to use the code from Example #1.

Fortunately, this problem is fairly easy to solve. Just wrap another query around the group query, to select the row-by-row information you're interested in...

Example 2:

SELECT 
  entries.id, 
  entries.time, 
  periods.idx as period_index, 
  unix_timestamp(periods.time) as period_timestamp
FROM 
  tbl entries
JOIN
  (select
     floor(timestampdiff( SECOND, tbl.time, most_recent.time)/31536000) as idx, 
     max(tbl.time) as time
   from
     tbl
     , (select max(time) as time from tbl) most_recent
   group by idx
  ) periods
ON entries.time = periods.time

Result:

+-----+---------------------+--------------+------------------+
| id  | time                | period_index | period_timestamp |
+-----+---------------------+--------------+------------------+
| 598 | 2011-09-28 04:10:02 |            0 |       1317183002 | 
| 996 | 2010-09-27 22:57:05 |            1 |       1285628225 | 
+-----+---------------------+--------------+------------------+

Notes:

  • Example 2 uses a period length of 31536000 seconds (365-days). While Example 1 (above) uses a period of 604800 seconds (7-days). Other than that, the inner query in Example 2 is the same as the primary query shown in Example 1.

  • If a matching period_time belongs to more than one entry (i.e. two or more entries have the exact same time, and that time matches one of the selected period_time values), then the above query (Example 2) will include multiple rows for the given period timestamp (one for each match). Whatever code consumes this result set should be prepared to handle such an edge case.

  • It's also worth noting that these queries will perform much, much better if you define an index on your datetime column. For my example schema, that would look like this:

    ALTER TABLE tbl ADD INDEX idx_time ( time )


If you're willing to go for the closest that is after the week is out then this'll work. You can extend it to work out the closest but it'll look so disgusting it's probably not worth it.

select unix_timestamp
     , ( select min(unix_tstamp)
           from my_table
          where sql_tstamp >= ( select max(sql_tstamp) - 7
                                  from my_table )
                ) 
    , ( select min(unix_tstamp)
           from my_table
          where sql_tstamp >= ( select max(sql_tstamp) - 14
                                  from my_table )
                )
  from my_table
 where sql_tstamp = ( select max(sql_tstamp)
                        from my_table )
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜