Oracle : Identifying duplicates in a table without index

2022-12-24 06:10 问答作者：

When I try to create a unique index on a large table, I get a unique contraint error. The unique index in this case is a composite key of 4 columns.

Is there an efficient way to identify the duplicates other than :

select col1, col2, col3, col4, count(*)
from Table1
group by col1, col2, col3, col4
having count(*) > 1

The explain plan above shows full table scan with extremely high cost, and just want to find if there is another way开发者_开发问答.

Thanks !

Try creating a non-unique index on these four columns first. That will take O(n log n) time, but will also reduce the time needed to perform the select to O(n log n).

You're in a bit of a bind here -- any way you slice it, the entire table has to be read in at least once. The naïve algorithm runs in O(n²) time, unless the query optimizer is clever enough to build a temporary index/table.

You can use the EXCEPTIONS INTO clause to trap the duplicated rows.

If you don't already have an EXCEPTIONS table create one using the provided script:

SQL>  @$ORACLE_HOME/rdbms/admin/ultexcpt.sql

Now you can attempt to create a unique constraint like this

alter table Table1
add  constraint tab1_uq UNIQUE (col1, col2, col3, col4)
exceptions into exceptions
/

This will fail but now your EXCEPTIONS table contains a list of all the rows whose keys contain duplicates, identified by ROWID. That gives you a basis for deciding what to do with the duplicates (delete, renumber, whatever).

edit

As others have noted you have to pay the cost of scanning the table once. This approach gives you a permanent set of the duplicated rows, and ROWID is the fastest way of accessing any given row.

Since there is no index on those columns, that query would have to do a full table scan - no other way to do it really, unless one or more of those columns is already indexed.

You could create the index as a non-unique index, then run the query to identify the duplicate rows (which should be very fast once the index is created). But I doubt if the combined time of creating the non-unique index then running the query would be any less than just running the query without the index.

In fact, you need to look for a duplicate of every single row in a table. No way to do this effectively without an index.

I don't think there is a quicker way unfortunately.

继续阅读：duplicates oracle plsql

Oracle : Identifying duplicates in a table without index

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？