Removing duplicates from multiple self left joins

2023-01-26 01:29 问答作者：

I am dynamically generating a query like below that creates different combinations of rules by left joining (any number of times) on itself and avoiding rules with some of the same attributes as part of the joins conditions e.g.

SELECT count(*) 
FROM rules AS t1 
LEFT JOIN rules AS t2
 ON t1.id != t2.id
 A开发者_如何学JAVAND ...
LEFT JOIN rules AS t3
 ON t1.id != t2.id AND t1.id != t3.id AND t2.id != t3.id
 AND ...

I am currently removing duplicates by creating an array of ids from the joined rows then sorting and grouping by them:

SELECT sort(array[t1.id, t2.id, t3.id]) AS ids
...
GROUP BY ids

I would like to know if there is a better way of removing duplicate rows e.g.

t1.ID | t2.ID | t3.ID
---------------------
  A   |   B   |   C
  C   |   B   |   A

Should be

t1.ID | t2.ID | t3.ID
---------------------
  A   |   B   |   C

t1.ID | t2.ID | t3.ID
---------------------
  C   |   B   |   A

But not both.

EDIT: I would like to go from a permutation of rows to a combination rows.

I'd suggest rather than joining on !=, try joining on <=.

You will then have all combinations with t1.id > t2.id, t2.id > t3.id, and so on.

Rows will not be 'duplicates' because they are ordered sets, and any set containing equivalent members would necessarily result in the identical ordered set.

I think you mean you want to go from a permutation of rows to a combination rows?

If so, the select distinct answers are wrong. Select distinct will select distinct permutations. I think you have a pretty good way of doing it. The only thing I can think of, would be to concatenate the rules into a string and the sort it in place. It looks like you are using Postgresql and there is no function that does it in the built-in string functions.

If the amount of symbols were small you might be able to insert them into the array pre-sorted by inserting 'A' in index 1, 'B' into index 2, etc. Which might might the sort quicker ...

You need to get an order into your results in order to filter all duplicates out. This can be achieved by making sure that a<b<c. And once you have an order in your results, you can apply a distinct to the resultset.

` SELECT count(*) FROM rules AS t1

LEFT JOIN rules AS t2 ON t1.id != t2.id AND

LEFT JOIN rules AS t3 ON t1.id != t2.id AND t1.id != t3.id AND t2.id != t3.id ...

t1.id < t2.id and t2.id < t3.id ...

AND ...`

Difficult to understand exactly what you're trying to achieve, but to avoid the A-B-C C-B-A duplication, try this:

SELECT count(*) 
FROM rules AS t1 
LEFT JOIN rules AS t2
 ON t1.id **<** t2.id
 AND ...
LEFT JOIN rules AS t3
 ON t1.id **<** t2.id AND t1.id **<** t3.id AND t2.id **<** t3.id
 AND ...

That way, the answers are always ordered

继续阅读：duplicate-removal duplicates join optimization sql

Removing duplicates from multiple self left joins

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？