开发者

PostgreSQL: trying to find miss and mister of the last month with highest rating

At my Drupal website users can rate each other and those timestamped ratings are stored in the pref_rep table:

# select id, nice, last_rated from pref开发者_C百科_rep where nice=true
  order by last_rated desc limit 7;
           id           | nice |         last_rated
------------------------+------+----------------------------
 OK152565298368         | t    | 2011-07-07 14:26:38.325716
 OK452217781481         | t    | 2011-07-07 14:26:10.831353
 OK524802920494         | t    | 2011-07-07 14:25:28.961652
 OK348972427664         | t    | 2011-07-07 14:25:17.214928
 DE11873                | t    | 2011-07-07 14:25:05.303104
 OK335285460379         | t    | 2011-07-07 14:24:39.062652
 OK353639875983         | t    | 2011-07-07 14:23:33.811986

Also I keep the gender of each user in the pref_users table:

# select id, female from pref_users limit 7;
       id       | female
----------------+--------
 OK351636836012 | f
 OK366097485338 | f
 OK251293359874 | t
 OK7848446207   | f
 OK335478250992 | t
 OK355400714550 | f
 OK146955222542 | t

I'm trying to create 2 Drupal blocks displaying "Miss last month" and "Mister last month", but my question is not about Drupal, so please don't move it to drupal.stackexchange.com ;-)

My question is about SQL: how could I find the user with the highest count of nice - and that for the last month? I would have 2 queries - one for female and one for non-female.

Using PostgreSQL 8.4.8 / CentOS 5.6 and SQL is sometimes so hard :-)

Thank you! Alex

UPDATE:

I've got a nice suggestion to cast timestamps to strings in order to find records for the last month (not for the last 30 days)

UPDATE2:

I've ended up doing string comparison:

select r.id,
        count(r.id),
        u.first_name,
        u.avatar,
        u.city
from pref_rep r, pref_users u where
        r.nice=true and
        to_char(current_timestamp - interval '1 month', 'IYYY-MM') =
        to_char(r.last_rated, 'IYYY-MM') and
        u.female=true and
        r.id=u.id
group by r.id , u.first_name, u.avatar, u.city
order by count(r.id) desc
limit 1


Say you run it once on the first day of the month, and cache the results, since counting votes on every page is kinda useless.

First some date arithmetic :

SELECT now(), 
       date_trunc( 'month', now() ) - '1 MONTH'::INTERVAL, 
       date_trunc( 'month', now() );

              now              |        ?column?        |       date_trunc       
-------------------------------+------------------------+------------------------
 2011-07-07 16:24:38.765559+02 | 2011-06-01 00:00:00+02 | 2011-07-01 00:00:00+02

OK, we got the bounds for the "last month" datetime range. Now we need some window function to get the first rows per gender :

SELECT * FROM (
   SELECT *, rank( ) over (partition by gender order by score desc ) 
   FROM (
      SELECT user_id, count(*) AS score FROM pref_rep 
      WHERE nice=true 
      AND last_rated >= date_trunc( 'month', now() ) - '1 MONTH'::INTERVAL
      AND last_rated <  date_trunc( 'month', now() )
      GROUP BY user_id) s1 
   JOIN users USING (user_id)) s2 
WHERE rank=1;

Note this can give you several rows in case of ex-aequo.

EDIT :

I've got a nice suggestion to cast timestamps to strings in order to find records for the last month (not for the last 30 days)

date_trunc() works much better.

If you make 2 queries, you'll have to make the count() twice. Since users can potentially vote many times for other users, that table will probably be the larger one, so scanning it once is a good thing.

You can't "leave joining back onto the users table to the outer part of the query too" because you need genders...

Query above takes about 30 ms with 1k users and 100k votes so you'd definitely want to cache it.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜