MySQL INTERSECT via joins table?

2023-02-25 17:25 问答作者：

So essentially I have t开发者_运维问答wo tables, containing URLS and TAGS, with a has-and-belongs-to-many relationship between the two via a joins tables TAGS_URLS.

A simple query to find URL's by tags would be:

SELECT urls.id FROM urls 
  INNER JOIN tags_urls ON urls.id=tags_urls.url_id
  INNER JOIN tags ON tags_urls.tag_id=tags.id 
WHERE tags.tag IN ("sample","tag","list");

However, I'm trying to recover an intersection of all URL's that contain all of a set of tags. I.e., only URL's that contain the tag "sample" AND "tag" AND "list".

I have a working query, but I cannot get the query to execute in less than 30 seconds.

SELECT a.id
  FROM
    (SELECT DISTINCT urls.id FROM urls
      INNER JOIN tags_urls ON tags_urls.url_id=urls.id INNER JOIN tags ON tags.id=tags_urls.tag_id
      WHERE tags.tag = 'sample') a
  JOIN
     (SELECT DISTINCT urls.id FROM urls
      INNER JOIN tags_urls ON tags_urls.url_id=urls.id INNER JOIN tags ON tags.id=tags_urls.tag_id
      WHERE tags.tag = 'list') b
  ON a.id = b.id;

The result set is correct, but the performance is horrific.

I do also currently have the data duplicated in a Redis database as a list of URL id's stored in tag sets so I can do something like this and get a result set VERY quickly.

SINTER "tag-sample" "tag-list"

Would it be possible, with reasonable effort, to bring the MySQL performance for this task up to the Redis levels with SINTER?

I am not 100% sure, but I think the underlying engine is creating a temp table for each of your subselects. Depending on the size of your data, this can be quite costly. If they are big (and they are in your case) temp tables have to write their contents out to disk because they are too big to hold in memory at once. So basically your query is copying huge amounts of data as it tries to build out two temporary tables that match the selection criteria for your two subselects. Once this is done, it finally executes the outer select and this most likely rather fast.

I would try to factor the subselects out for inner joins. I think the following will give you what you are looking for:

select urls.id from urls
inner join tags_urls tu1 on tu1.url_id = urls.id
inner join tags t1 on tu1.tag_id = t1.id and t1.tag = 'sample'
inner join tag_urls tu2 on tu2.url_id = urls.id
inner join tags t2 on t2.id = tu2.tag_id and t2.tag = 'list'

You would continue to add pairs of inner joins to tag_urls and tags for each 'tag' you wanted to intersect with. Again, run this through explain and make sure everything has the right index.

DBMS's can do pretty well with a several inner joins but as you increase the number of intersections, your performance will decrease.

You might try replacing the sql subqueries in the 2nd statmenet with joins. Robert Vieira claims in his Sql Server books that sometimes joins are faster and sometimes subqueries are faster. Hard to believe that the same would not be true for MySql as well. Also, if there ia quite a bit of other data in the table, other than 'list' or 'sample', then you might want to insert this data into a temp table and run your queries from that table instead. This is especially true if you are going to be running multiple queries on that data.

继续阅读：has-and-belongs-to-many intersect

MySQL INTERSECT via joins table?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？