开发者

MYSQL sum() for distinct rows

I'm looking for help using sum() in my SQL query:

SELECT links.id, 
       count(DISTINCT stats.id) as clicks, 
   开发者_高级运维    count(DISTINCT conversions.id) as conversions, 
       sum(conversions.value) as conversion_value 
FROM links 
LEFT OUTER JOIN stats ON links.id = stats.parent_id 
LEFT OUTER JOIN conversions ON links.id = conversions.link_id 
GROUP BY links.id 
ORDER BY links.created desc;

I use DISTINCT because I'm doing "group by" and this ensures the same row is not counted more than once.

The problem is that SUM(conversions.value) counts the "value" for each row more than once (due to the group by)

I basically want to do SUM(conversions.value) for each DISTINCT conversions.id.

Is that possible?


I may be wrong but from what I understand

  • conversions.id is the primary key of your table conversions
  • stats.id is the primary key of your table stats

Thus for each conversions.id you have at most one links.id impacted.

You request is a bit like doing the cartesian product of 2 sets :

[clicks]
SELECT *
FROM links 
LEFT OUTER JOIN stats ON links.id = stats.parent_id 

[conversions]
SELECT *
FROM links 
LEFT OUTER JOIN conversions ON links.id = conversions.link_id 

and for each link, you get sizeof([clicks]) x sizeof([conversions]) lines

As you noted the number of unique conversions in your request can be obtained via a

count(distinct conversions.id) = sizeof([conversions])

this distinct manages to remove all the [clicks] lines in the cartesian product

but clearly

sum(conversions.value) = sum([conversions].value) * sizeof([clicks])

In your case, since

count(*) = sizeof([clicks]) x sizeof([conversions])
count(*) = sizeof([clicks]) x count(distinct conversions.id)

you have

sizeof([clicks]) = count(*)/count(distinct conversions.id)

so I would test your request with

SELECT links.id, 
   count(DISTINCT stats.id) as clicks, 
   count(DISTINCT conversions.id) as conversions, 
   sum(conversions.value)*count(DISTINCT conversions.id)/count(*) as conversion_value 
FROM links 
LEFT OUTER JOIN stats ON links.id = stats.parent_id 
LEFT OUTER JOIN conversions ON links.id = conversions.link_id 
GROUP BY links.id 
ORDER BY links.created desc;

Keep me posted ! Jerome


Jeromes solution is actually wrong and can produce incorrect results!!

sum(conversions.value)*count(DISTINCT conversions.id)/count(*) as conversion_value

let's assume the following table

conversions
id value
1 5
1 5
1 5
2 2
3 1

the correct sum of value for distinct ids would be 8. Jerome's formula produces:

sum(conversions.value) = 18
count(distinct conversions.id) = 3
count(*) = 5
18*3/5 = 9.6 != 8


For an explanation of why you were seeing incorrect numbers, read this.

I think that Jerome has a handle on what is causing your error. Bryson's query would work, though having that subquery in the SELECT could be inefficient.


Use the following query:

SELECT links.id
  , (
    SELECT COUNT(*)
    FROM stats
    WHERE links.id = stats.parent_id
  ) AS clicks
  , conversions.conversions
  , conversions.conversion_value
FROM links
LEFT JOIN (
  SELECT link_id
    , COUNT(id) AS conversions
    , SUM(conversions.value) AS conversion_value
  FROM conversions
  GROUP BY link_id
) AS conversions ON links.id = conversions.link_id
ORDER BY links.created DESC


I use a subquery to do this. It eliminates the problems with grouping. So the query would be something like:

SELECT COUNT(DISTINCT conversions.id)
...
     (SELECT SUM(conversions.value) FROM ....) AS Vals


How about something like this:

select l.id, count(s.id) clicks, count(c.id) clicks, sum(c.value) conversion_value
from    (SELECT l.id id, l.created created,
               s.id clicks,  
               c.id conversions,  
               max(c.value) conversion_value                    
        FROM links l
        LEFT JOIN stats s ON l.id = s.parent_id
        LEFT JOIN conversions c ON l.id = c.link_id  
        GROUP BY l.id, l.created, s.id, c.id) t
order by t.created  


This will do the trick, just divide the sum with the count of conversation id which are duplicate.

SELECT a.id,
       a.clicks,
       SUM(a.conversion_value/a.conversions) AS conversion_value,
       a.conversions
FROM (SELECT links.id, 
       COUNT(DISTINCT stats.id) AS clicks, 
       COUNT(conversions.id) AS conversions, 
       SUM(conversions.value) AS conversion_value 
      FROM links 
      LEFT OUTER JOIN stats ON links.id = stats.parent_id 
      LEFT OUTER JOIN conversions ON links.id = conversions.link_id 
      GROUP BY conversions.id,links.id
      ORDER BY links.created DESC) AS a
GROUP BY a.id


Select sum(x.value) as conversion_value,count(x.clicks),count(x.conversions)
FROM
(SELECT links.id, 
       count(DISTINCT stats.id) as clicks, 
       count(DISTINCT conversions.id) as conversions,
       conversions.value,       
FROM links 
LEFT OUTER JOIN stats ON links.id = stats.parent_id 
LEFT OUTER JOIN conversions ON links.id = conversions.link_id 
GROUP BY conversions.id) x
GROUP BY x.id 
ORDER BY x.created desc;

I believe this will give you the answer that you are looking for.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜