Use table column value for LIMIT when performing join

2023-03-28 04:19 问答作者：

I have a situation where I'm performing a join between two tables, and I need a value from one table to be used as a LIMIT factor for a subquery in the join. Assume I have the following [extremely simplified] tables -

data:

experiment_id | value
--------------|--------
       1      |  2.5
       1      |  2.6
       1      |  4.5
       1      |  2.3
       1      |  3.5
       1      |  2.8
       2      |  2.3
       2      |  1.2
       2      |  1.1
       2      |  3.6
       2      |  3.8
       2      |  4.1
       2      |  7.9
       2      |  4.2
       2      |  1.0


data_clip:

experiment_id | clip_index
--------------|------------
       1      |     3
       2      |     5

I need to sum each experiment's sorted values up to a certain clip_index, that varies between experiments. So, my result table will ideally look like this:

results:
experiment_id |  sum
--------------|-------
       1      |  7.6    # => 2.3 + 2.5 + 2.8
       2      | 13.0    # => 1.0 + 1.1 + 1.2 + 2.3 + 3.6 + 3.8

Typically, I'd do this calculation with some client side scripting (ruby, python, etc), but I wanted to try doing this on the db level. Some imaginary SQL might look like this (there's all sorts of wrong with this query, I know, but hopefully you get the idea):

SELECT 
  T0.experiment_id as `id`,
  (SELECT SUM(x.value) from
       (SELECT   value 
        FROM     data 
        WHERE    experiment_id = t0.experiment_id
        ORDER BY value 
        LIMIT    t0.clip_index ) as x) AS `sum`
FROM data_clip AS t0

Several problems:

LIMIT must be defined with a constant (1000, 10, etc.), not a column.
the WHERE condition in the subquery fails for not recognizing t0 table, which is external to the sub-query.

My question is 开发者_运维问答basically how to accomplish the variable limit and sum between the two tables using mostly SQL. I thought about using group_concat and substring_index to isolate the values up to clip_index for each row, but then there's the issue of summing up the numbered strings ("1.2,2.3,3.2") and the server limitation on the size of group_concat buffer (configurable, but values can be around ~100k per experiment). Any thoughts? Thanks.

I guess you just need to include a row number with each value selected and limit the results by the number of rows something like this: (not not tested)

SELECT T0.experiment_id as `id`,   
(SELECT SUM(x.value) from       
 (SELECT value,@rownum := @rownum + 1 AS rownum           
  FROM data         
  JOIN (SELECT @rownum := 0) r
  WHERE experiment_id = t0.experiment_id         
  ORDER BY value             
 ) AS x
WHERE x,rownum < t0.clip_index
) AS `sum`
 FROM data_clip AS t0

see: MySQL - Get row number on select

I think this will work when all values are positive. If there are negative values, one more level is needed.

SELECT experiment_id
     , MIN(sumValue) - (MIN(cnt)-clip_id) * MIN(maxValue)
       AS sumValue
FROM
  ( SELECT e.experiment_id
         , e.clip_id
         , COUNT(*)         AS cnt
         , SUM(d2.value)    AS sumValue
         , d.value          AS maxValue
    FROM experiment AS e  
      JOIN data AS d
        ON d.experiment_id = e.experiment_id
      JOIN data AS d2
        ON d2.experiment_id = e.experiment_id
        AND d2.value <= d.value
    GROUP BY e.experiment_id
           , d.id                               --- table's `data` Primary Key
    HAVING COUNT(*) >= e.clip_id
  ) AS grp
GROUP BY experiment_id

继续阅读：greatest-n-per-group join limit

Use table column value for LIMIT when performing join

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？