Efficient query to find singular orders for a product

2023-02-20 17:17 问答作者：

I have two tables orders, and line_items with following structure:

Orders (id = PK, indexes on user_id)
-------------------------------------
id   user_id
==   ======
1     1
2     2
3     1
4     3
5     1

LineItems (id = PK, indexes on order_id and product_id)
id   order_id product_id quantity
==   ======   ========   ======
1      1         1       1
2      1         2       2
3      2         1       4
4      2         3       6
5      3         1       1
6      4         1       1
7      5         1       1

I am trying to find the most efficient way to solve the following requirements:

Given a user and a product find the LineItems belonging to Orders where given product is the only product ordered. E.g: If user_id is 1 and product_id is 1, the query should return line items 5 and 7.
Given a user and a product find the Orders where given product is the only product ordered. E.g: If user_id is 1 and product_id is 1, the query should return orders 3 and 5.

The Orders and LineItems table can have millions of rows.

I have a working solution that uses COUNT and HAVING. I am not certain that this is the most efficient solution.

Specifically, I am wondering if this can be addressed by using the technique outlined by Cletus in this answer.

Note: I am using Orders and LineItems tables to describe the s开发者_如何学编程cenario. My actual table is quite different and it not related to order etc.

Edit 2

Is this query efficient than using GROUP BY and HAVING?

SELECT A.id
FROM   LineItems A
JOIN   Orders B ON B.id = A.order_id AND B.user_id = 1
LEFT OUTER JOIN LineItems C ON C.order_id = A.order_id AND 
                               C.product_id != A.product_id   
WHERE  A.product_id = 1 AND C.id IS NULL

select o.id OrderID, MIN(i.id) LineItemID
from orders o
inner join lineitems i on i.order_id = o.id
where o.user_id= 1
group by o.id
having count(*)=1

GROUP BY, HAVING, COUNT is the most efficient for this type of query. Basically it will scan the required data fully, but only within the user's orders, but in that single pass will produce the result.

You can kill two birds with one stone, since for orders with a single line item, min(i.id) gives you the (only) LineItemID.

Indexes you NEED to have: orders.user_id, lineitems.order_id

select 
  * 
from 
  (
    select 
      *  
    from 
      LineItems   
    group by 
      order_id 
    having count(*) = 1 
  ) l 
    inner join Orders o on l.order_id = o.id and user_id =1 and product_id =1

If you have really huge project and really huge amount of data then it would be better to have "similar goods" to be precalculated, and refreshed by some scheduler (once a day, hour, week, ...) or some "trigger" (after new good was added).

It is impossible to make the queries that you mentioned (using COUNT + HAVING + GROUP BY) to be highly performant.

Count(*) =1 is special: you don't need to actually count to detect it You could for instance use NOT EXISTS to pick the wanted tuples:

SELECT id
FROM lineitems li
WHERE NOT EXISTS (
    SELECT *
    FROM lineitems nx
    WHERE nx.order_id = li.order_id
    AND nx.id <> li.id
    )
    ;

This (sub)query can be very fast (most codegenerators will detect it as an ANTI-join). The grouping (on order_id) will still be needed internally, but the counting can be omitted. (the subquery can return false once the first duplicate order_id is encountered)

继续阅读：sql

Efficient query to find singular orders for a product

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？