开发者

MySQL query - possible to include this clause?

开发者_如何学C

I have the following query, which retrieves 4 adverts from certain categories in a random order.

At the moment, if a user has more than 1 advert, then potentially all of those ads might be retrieved - I need to limit it so that only 1 ad per user is displayed.

Is this possible to achieve in the same query?

SELECT      a.advert_id, a.title, a.url, a.user_id, 
            FLOOR(1 + RAND() * x.m_id) 'rand_ind' 

FROM        adverts AS a
INNER JOIN  advert_categories AS ac
ON          a.advert_id = ac.advert_id,
(
            SELECT MAX(t.advert_id) - 1 'm_id' 
            FROM adverts t
)           x

WHERE       ac.category_id IN 
(
            SELECT category_id
            FROM website_categories
            WHERE website_id = '8'
)
AND         a.advert_type = 'text'

GROUP BY    a.advert_id
ORDER BY    rand_ind 
LIMIT       4


Note: The solution is the last query at the bottom of this answer.

Test Schema and Data

create table adverts (
    advert_id int primary key, title varchar(20), url varchar(20), user_id int, advert_type varchar(10))
;
create table advert_categories (
    advert_id int, category_id int, primary key(category_id, advert_id))
;
create table website_categories (
    website_id int, category_id int, primary key(website_id, category_id))
;
insert website_categories values
    (8,1),(8,3),(8,5),
    (1,1),(2,3),(4,5)
;
insert adverts (advert_id, title, user_id) values
    (1, 'StackExchange', 1),
    (2, 'StackOverflow', 1),
    (3, 'SuperUser', 1),
    (4, 'ServerFault', 1),
    (5, 'Programming', 1),
    (6, 'C#', 2),
    (7, 'Java', 2),
    (8, 'Python', 2),
    (9, 'Perl', 2),
   (10, 'Google', 3)
;
update adverts set advert_type = 'text'
;
insert advert_categories values
    (1,1),(1,3),
    (2,3),(2,4),
    (3,1),(3,2),(3,3),(3,4),
    (4,1),
    (5,4),
    (6,1),(6,4),
    (7,2),
    (8,1),
    (9,3),
   (10,3),(10,5)
;

Data properties

  • each website can belong to multiple categories
  • for simplicity, all adverts are of type 'text'
  • each advert can belong to multiple categories. If a website has multiple categories that are matched multiple times in advert_categories for the same user_id, this causes the advert_id's to show twice when using a straight join between 3 tables in the next query.

This query joins the 3 tables together (notice that ids 1, 3 and 10 each appear twice)

select *
from website_categories wc
inner join advert_categories ac on wc.category_id = ac.category_id
inner join adverts a on a.advert_id = ac.advert_id and  a.advert_type = 'text'
where wc.website_id='8'
order by a.advert_id

To make each website show only once, this is the core query to show all eligible ads, each only once

        select *
        from adverts a
        where a.advert_type = 'text'
          and exists (
            select *
            from website_categories wc
            inner join advert_categories ac on wc.category_id = ac.category_id
            where wc.website_id='8'
              and a.advert_id = ac.advert_id)

The next query retrieves all the advert_id's to be shown

select advert_id, user_id
from (
    select
        advert_id, user_id,
        @r := @r + 1 r
    from (select @r:=0) r
    cross join 
    (
        # core query -- vvv
        select a.advert_id, a.user_id
        from adverts a
        where a.advert_type = 'text'
          and exists (
            select *
            from website_categories wc
            inner join advert_categories ac on wc.category_id = ac.category_id
            where wc.website_id='8'
              and a.advert_id = ac.advert_id)
        # core query -- ^^^
        order by rand()
    ) EligibleAdsAndUserIDs
) RowNumbered
group by user_id
order by r
limit 2

There are 3 levels to this query

  1. aliased EligibleAdsAndUserIDs: core query, sorted randomly using order by rand()
  2. aliased RowNumbered: row number added to core query, using MySQL side-effecting @variables
  3. the outermost query forces mysql to collect rows as numbered randomly in the inner queries, and group by user_id causes it to retain only the first row for each user_id. limit 2 causes the query to stop as soon as two distinct user_id's have been encountered.

This is the final query which takes the advert_id's from the previous query and joins it back to table adverts to retrieve the required columns.

  1. only once per user_id
  2. feature user's with more ads proportionally (statistically) to the number of eligible ads they have

Note: Point (2) works because the more ads you have, the more likely you will hit the top placings in the row numbering subquery

select a.advert_id, a.title, a.url, a.user_id
from
(
    select advert_id
    from (
        select
            advert_id, user_id,
            @r := @r + 1 r
        from (select @r:=0) r
        cross join 
        (
            # core query -- vvv
            select a.advert_id, a.user_id
            from adverts a
            where a.advert_type = 'text'
              and exists (
                select *
                from website_categories wc
                inner join advert_categories ac on wc.category_id = ac.category_id
                where wc.website_id='8'
                  and a.advert_id = ac.advert_id)
            # core query -- ^^^
            order by rand()
        ) EligibleAdsAndUserIDs
    ) RowNumbered
    group by user_id
    order by r
    limit 2
) Top2
inner join adverts a on a.advert_id = Top2.advert_id;


I'm thinking through something but don't have MySQL available.. can you try this query to see if it works or crashes...

SELECT 
      PreQuery.user_id, 
      (select max( tmp.someRandom ) from PreQuery tmp where tmp.User_ID = PreQuery.User_ID ) MaxRandom
   from 
      ( select adverts.user_id,
               rand() someRandom
           from adverts, advert_categories
           where adverts.advert_id = advert_categories.advert_id ) PreQuery

If the "tmp" alias is recognized as a temp buffer of the preliminary query as defined by the OUTER FROM clause, I might have something that will work... I think the field as a select statement from a queried from WONT work, but if it does, I know I'll have something solid for you.


Ok, this one might make the head hurt a bit, but lets get the logical thing going... The inner most "Core Query" is a basis that gets all unique and randomly assigned QUALIFIED Users that have a qualifying ad base on the category chosen, and type = 'text'. Since the order is random, I don't care what the assigned sequence is, and order by that. The limit 4 will return the first 4 entries that qualify. This is regardless of one user having 1 ad vs another having 1000 ads.

Next, join to the advertisements, reversing the table / join qualifications... but by having a WHERE - IN SUB-SELECT, the sub-select will be on each unique USER ID that was qualified by the "CoreQuery" and will ONLY be done 4 times based on ITs inner limit. So even if 100 users with different advertisements, we get 4 users.

Now, the Join to the CoreQuery is the Advert Table based on the same qualifying user. Typically this would join ALL records against the core query given they are for the same user in question... This is correct... HOWEVER, the NEXT WHERE clause is what filters it down to only ONE ad for the given person.

The Sub-Select is making sure its "Advert_ID" matches the one selected in the sub-select. The sub-select is based ONLY on the current "CoreQuery.user_ID" and gets ALL the qualifying category / ads for the user (wrong... we don't want ALL ads)... So, by adding an ORDER BY RAND() will randomize only this one person's ads in the result set... then Limiting THAT by 1 will only give ONE of their qualified ads...

So, the CoreQuery restricts down to 4 users. Then for each qualified user ID, gets only 1 of the qualified ads (by its inner order by RAND() and LIMIT 1 )...

Although I don't have MySQL to try, the queries are COMPLETELY legit and hope it works for you.... man, I love brain teasers like this...

SELECT
       ad1.*
   from 
       ( SELECT ad.user_id,
                count(*) as UserAdCount,
                RAND() as ANYRand
             from 
                website_categories wc
                   inner join advert_categories ac
                      ON wc.category_id = ac.category_id
                      inner join adverts ad
                         ON    ac.advert_id = ad.advert_id 
                           AND ad.advert_type = 'text'
             where
                wc.website_id = 8
             GROUP BY 
                1
             order by
                3 
             limit 
                4 ) CoreQuery,
         adverts ad1
     WHERE
             ad1.advert_type = 'text'
         AND CoreQuery.User_ID = ad1.User_ID
         AND ad1.advert_id in 
                ( select 
                          ad2.advert_id
                     FROM 
                          adverts ad2,
                          advert_categories ac2,
                          website_categories wc2
                     WHERE
                            ad2.user_id = CoreQuery.user_id
                        AND ad2.advert_id = ac2.advert_id
                        AND ac2.category_id = wc2.category_id
                        AND wc2.website_id = 8
                     ORDER BY 
                        RAND()
                     LIMIT 
                        1 )


I like to suggest that you do the random with php. This is way faster than doing it in mySQL.

"However, when the table is large (over about 10,000 rows) this method of selecting a random row becomes increasingly slow with the size of the table and can create a great load on the server. I tested this on a table I was working that contained 2,394,968 rows. It took 717 seconds (12 minutes!) to return a random row." http://www.greggdev.com/web/articles.php?id=6


set @userid = -1;
select
  a.id,
  a.title,
  case when @userid = a.userid then
    0
  else
    1
  end as isfirst,
  (@userid := a.userid)
from
  adverts a
  inner join advertcategories ac on ac.advertid = a.advertid
  inner join categories c on c.categoryid = ac.categoryid
where
  c.website = 8
order by
  a.userid,
  rand()
having
  isfirst = 1
limit 4


Add COUNT(a.user_id) as owned in the main select directive and add HAVING owned < 2 after Group By

http://dev.mysql.com/doc/refman/5.5/en/select.html

I think this is the way to do it, if the one user has more than one advert then we will not select it.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜