SQL count many-to-many values or have it counted every time new row is added?

2022-12-18 18:11 问答作者：

I am using MySQL (MyISAM) 5.0.41 and I have this query:

SELECT `x`.`items`.id, `x`.`items`.name, COUNT(*) AS count
    FROM `x`.`items` INNER JOIN `x`.`user_items`
    ON `x`.`items`.id = `x`.`user_items`.item_id
    GROUP BY name HAVING count > 2 ORDER BY count DESC

I have about 36,000 users, 175,000 user_items and 60,000 items which are constantly added to. So this query is getting a bit slow...

Is it better to:

Have a count field in items and update that periodically (say each time a us开发者_Go百科er adds an item)
or run the query like this (slowly)..

Or is there any SQL that will populate the count field for me?

Thanks

You can use an intermediate solution:

Add a ts DATETIME column to the user_items table which would describe the time the user added the item
Add a ts DATETIME column to the users table which would describe the point of actuality, as long as cnt, the cached count column

Periodically update the users table with the new count and timestamp:

INSERT
INTO    users (id, ts, cnt)
SELECT  *
FROM    (
        SELECT  user_id, NOW() AS nts, COUNT(*) AS ncnt
        FROM    user_items ui
        WHERE   ui.timestamp <= NOW()
        )
ON DUPLICATE KEY
UPDATE  ts = nnow,
        cnt = ncnt

Invalidate the user's timestamp when a user_items entry is deleted

Issue this query to count the items:

SELECT  u.id, u.cnt +
        (
        SELECT  COUNT(*)
        FROM    user_items ui
        WHERE   ui.ts > u.ts
                AND ui.user_id = u.id
        )
FROM    users

This way, only the newly added items will be counted in the user_items table which is much faster, and you won't have concurrency issues with updating the records too often.

You should start by indexing user_items.item_id and grouping on it instead of name. Strings are much slower to group by (try it out for yourself), and the index should speed things up a bit more. If that still is too slow, you could run the GROUP BY query first and then join on the items table if your DBMS execution plan isn't doing that by default.

That query is pretty much doing a full table scan every time. There is no way around that. Indexes will speed things up my speeding up the join, but the query will just get slower and slower as your data grows.

Storing summary data, like the "count" with the "items" would be the way to go. You can do this with stored procedures or through code. As a double check, you can periodically (i.e. once per day) update all counts so you know they are accurate.

My impulse would be to leave the data in something like normal form (in other words, do not increment a "count" field), and then cache the result of the slow query at the application level.

If caching is ineffective, because many people are doing the query, and few of them do it twice, then, yes, you can set up a stored procedure that automatically updates some row in some table. The details vary depending on DB vendor. Here's how to do it in Postgresql. This is the only safe way to do it (i.e., within the DB, and not from the application layer) due to race conditions.

Are you really getting all 36,000 users every time that you run your query? If you're looking to find the source of a performance issue then that could be it right there.

Depending on your RDBMS you could look at things like indexed or materialized views. Including the count as part of the table and trying to maintain it will almost certainly be a mistake, especially with the small size of your database.

继续阅读：count many-to-many sql

SQL count many-to-many values or have it counted every time new row is added?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？