Use table column value for LIMIT when performing join
I have a situation where I'm performing a join between two tables, and I need a value from one table to be used as a LIMIT factor for a subquery in the join. Assume I have the following [extremely simplified] tables -
data:
experiment_id | value
--------------|--------
1 | 2.5
1 | 2.6
1 | 4.5
1 | 2.3
1 | 3.5
1 | 2.8
2 | 2.3
2 | 1.2
2 | 1.1
2 | 3.6
2 | 3.8
2 | 4.1
2 | 7.9
2 | 4.2
2 | 1.0
data_clip:
experiment_id | clip_index
--------------|------------
1 | 3
2 | 5
I need to sum each experiment's sorted values up to a certain clip_index, that varies between experiments. So, my result table will ideally look like this:
results:
experiment_id | sum
--------------|-------
1 | 7.6 # => 2.3 + 2.5 + 2.8
2 | 13.0 # => 1.0 + 1.1 + 1.2 + 2.3 + 3.6 + 3.8
Typically, I'd do this calculation with some client side scripting (ruby, python, etc), but I wanted to try doing this on the db level. Some imaginary SQL might look like this (there's all sorts of wrong with this query, I know, but hopefully you get the idea):
SELECT
T0.experiment_id as `id`,
(SELECT SUM(x.value) from
(SELECT value
FROM data
WHERE experiment_id = t0.experiment_id
ORDER BY value
LIMIT t0.clip_index ) as x) AS `sum`
FROM data_clip AS t0
Several problems:
- LIMIT must be defined with a constant (1000, 10, etc.), not a column.
- the
WHERE
condition in the subquery fails for not recognizingt0
table, which is external to the sub-query.
My question is 开发者_运维问答basically how to accomplish the variable limit and sum between the two tables using mostly SQL. I thought about using group_concat
and substring_index
to isolate the values up to clip_index
for each row, but then there's the issue of summing up the numbered strings ("1.2,2.3,3.2"
) and the server limitation on the size of group_concat
buffer (configurable, but values can be around ~100k per experiment). Any thoughts? Thanks.
I guess you just need to include a row number with each value selected and limit the results by the number of rows something like this: (not not tested)
SELECT T0.experiment_id as `id`,
(SELECT SUM(x.value) from
(SELECT value,@rownum := @rownum + 1 AS rownum
FROM data
JOIN (SELECT @rownum := 0) r
WHERE experiment_id = t0.experiment_id
ORDER BY value
) AS x
WHERE x,rownum < t0.clip_index
) AS `sum`
FROM data_clip AS t0
see: MySQL - Get row number on select
I think this will work when all values are positive. If there are negative values, one more level is needed.
SELECT experiment_id
, MIN(sumValue) - (MIN(cnt)-clip_id) * MIN(maxValue)
AS sumValue
FROM
( SELECT e.experiment_id
, e.clip_id
, COUNT(*) AS cnt
, SUM(d2.value) AS sumValue
, d.value AS maxValue
FROM experiment AS e
JOIN data AS d
ON d.experiment_id = e.experiment_id
JOIN data AS d2
ON d2.experiment_id = e.experiment_id
AND d2.value <= d.value
GROUP BY e.experiment_id
, d.id --- table's `data` Primary Key
HAVING COUNT(*) >= e.clip_id
) AS grp
GROUP BY experiment_id
精彩评论