Oracle bug? SELECT returns no dupes, INSERT from SELECT has duplicate rows
I'm getting some strange behaviour from an Oracle instance I'm working on. This is 11gR1 on Itanium, no RAC, nothing fancy. Ultimately I'm moving data from one Oracle instance to another in a data warehouse scenario.
I have a semi-complex view running over a DB link; 4 inner joins over large-ish tables and 5 left joins over mid-size tables.
Here's the problem: when I test the view in SQL Developer (or SQL*Plus) it seems fine, no duplication whatsoever. However, when I actually use the view to insert data into a table I get a large number of dupes.
EDIT: - The data is going into an empty table. All of the tables in the query are on the database link. The only thing passed into the query is a date (e.g. INSERT INTO target SELECT * FROM view WHERE view.datecol = dQueryDate) -
I've tried adding a ROW_NUMBER() function to the select statement, partitioned by the PK for the view. All rows come back numbered as 1. Again though, the same statement run as an insert generates the same dupes as before and now conveniently numbered. The number of duped rows is not the same per key. Some records exist 4 times some only exist once.
I find this to behaviour to be extremely perplexing. :) It reminds me of working with Teradata where you have SET tables (unique rows only) and MULTISET tables (duplicate rows allowed) but Oracle has no such functionality.
A select that returns rows to the client should behave identically to one that inserts those rows to another location. I can't imagine a legitimate reason for this to happen, but maybe I'm suffering from a failure of imagination. ;)
I wonder if anyone else has experienced this or if it's a bug on this platform.
SOLUTION
Thanks to @Gary, I was able to get to the bottom of this by using "EXPLAIN PLAN FOR {my query};" and "SELECT * FROM TABLE(dbms_xplan.display);". The explain that actually gets used for the INSERT is very different from the SELECT.
For the SELECT most of the plan operations are 'TABLE ACCESS BY INDEX ROWID' and 'INDEX UNIQUE SCAN'. The 'Predicate Information' block contains all of the joins and filters from the query. At the end it says "Note - fully remote statement".
For the INSERT there is no reference to the indexes. The 'Predicate Information' block is just three lines and a new 'Remote SQL' block shows 9 small SQL statements.
The data开发者_运维技巧base has split my query into 9 subqueries and then attempts to join them locally. By running the smaller selects I've located the source of the duplicates.
I believe this is bug in the Oracle compiler around remote links. It creates logical flaws when re-writing the SQL. Basically the compiler is not properly applying the WHERE clause. I was just testing it and gave it an IN list of 5 keys to bring back. SELECT brings back 5 rows. INSERT puts 77,000+ rows into the target and totally ignores the IN list.
{Still looking for a way to force the correct behaviour, I may have to ask for the view to be created on the remote database although that is not ideal from a development viewpoint. I'll edit this when I've got it working…}
It seems to be Oracle Bug, we have found this following workarround:
If you want that your "insert into select ...
" work like your "select ...
", you can pack your select in a sub select.
For example :
select x,y,z from table1, table2, where ...
--> no duplicate
insert into example_table
select x,y,z from table1, table2, where ...
--> duplicate error
insert into example_table
select * from (
select x,y,z from table1, table2, where ...
)
--> no duplicate
Regards
One thing that comes to mind is that generally an optimizer plan for a SELECT will prefer a FIRST_ROWS plan to give rows back to the caller early, but an INSERT...SELECT will prefer an ALL_ROWS plan as it is going to have to deliver the full dataset. I'd check the query plans using DBMS_XPLAN.DISPLAY_CURSOR (using the sql_id from V$SQL).
I have a semi-complex view running over a DB link; 4 inner joins over large-ish tables and 5 left joins over mid-size tables. ... All of the tables in the query are on the database link
Again, a potential trouble-spot. If all the tables in the SELECT were on the other end of the DB link, the whole query would be sent to the remote database and the resultset returned. Once you throw the INSERT in, it is more likely that the local database will take charge of the query and pull all the data from the child tables over. But that may depend on whether the view is defined in the local database or the remote database. In the latter case, as far as the local optimizer is concerned there is just one remote object and it gets data from that, and the remote database will do the join.
What happens if you just go to the remote DB and do the INSERT on a table there ?
This is a bug in Oracle's handling of joins over DB links. I have a simpler situation which does not involve an INSERT versus SELECT. If I run my query remotely, I get duplicate rows, but if I run it locally, I do not. The only difference between the queries is the "@..." appended to the tables in the remote query. I am querying a 9i database from a 10.2 database using Oracle SQL Developer 3.0.
This even more stupid than that bug in Oracle which prevents you from joining tables with more than 1000 total columns, which is VERY easy to do when querying the ERP system. And no, the error message is nothing about tables having too many columns.
It's almost as stupid as that other Oracle database bug that prohibits querying tables containing LOB locators using ANSI syntax. Only Oracle syntax works!
Several options occur to me.
The dupes you see were already in the destination table ??
If in your Select, you reference the table you are Inserting into, ( ? ), then The Insert is interacting with the select in your combined
Insert ... Select ... From ...
In such a way (cartesian products ?) as to create the duplicates
I can't help but think that maybe you are experiencing a side-effect from something else related to the table. Are there any triggers which may be manipulating data?
How did you determine that there are no dupes in the original table?
As others have noted this seems to be the simpledst explanation for this strange behaviour.
Check your JOIN
s carefully. Potentially you have no duplicates in the individual tables, but underspecified joins can cause inadvertant CROSS JOIN
s so that your result set has duplicates due to multiplicity and, when inserted, this violates a uniqueness constraint in your destination table.
What I do in this case is to nest the query in a view or CTE and try to detect the duplicates straight from the SELECT
:
WITH resultset AS (
-- blah, blah
)
SELECT a, b, c, COUNT(*)
FROM resultset
GROUP BY a, b, c
HAVING COUNT(*) > 1
I would suggest getting a plan on the query you are running and looking for a CARTESIAN JOIN in there. This could indicate a missing condition that is causing duplicated rows.
AS @Pop has already suggested this behaviour could happen if you are using a different login in SQLPlus to the login when your insert is running. (That is if the other login has a table/view/synonym with the same name)
精彩评论