Continuing a transaction after primary key violation error

2022-12-21 23:29 问答作者：

I am doing a bulk insert of records into a database from a log file. Occasionally (~1 row out of every thousand) one of the rows violates the primar开发者_开发知识库y key and causes the transaction to fail. Currently, the user has to manually go through the file that caused the failure and remove the offending row before attempting to re-import. Given that there are hundreds of these files to import it is impractical.

My question: How can I skip the insertion of records that will violate the primary key constraint, without having to do a SELECT statement before each row to see if it already exists?

Note: I am aware of the very similar question #1054695, but it appears to be a SQL Server specific answer and I am using PostgreSQL (importing via Python/psycopg2).

You can also use SAVEPOINTs in a transaction.

Pythonish pseudocode is illustrate from the application side:

database.execute("BEGIN")
foreach data_row in input_data_dictionary:
    database.execute("SAVEPOINT bulk_savepoint")
    try:
        database.execute("INSERT", table, data_row)
    except:
        database.execute("ROLLBACK TO SAVEPOINT bulk_savepoint")
        log_error(data_row)
        error_count = error_count + 1
    else:
        database.execute("RELEASE SAVEPOINT bulk_savepoint")

if error_count > error_threshold:
    database.execute("ROLLBACK")
else:
    database.execute("COMMIT")

Edit: Here's an actual example of this in action in psql based on a slight variation of the example in the documentation (SQL statements prefixed by ">"):

> CREATE TABLE table1 (test_field INTEGER NOT NULL PRIMARY KEY);
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "table1_pkey" for table "table1"
CREATE TABLE

> BEGIN;
BEGIN
> INSERT INTO table1 VALUES (1);
INSERT 0 1
> SAVEPOINT my_savepoint;
SAVEPOINT
> INSERT INTO table1 VALUES (1);
ERROR:  duplicate key value violates unique constraint "table1_pkey"
> ROLLBACK TO SAVEPOINT my_savepoint;
ROLLBACK
> INSERT INTO table1 VALUES (3);
INSERT 0 1
> COMMIT;
COMMIT
> SELECT * FROM table1;  
 test_field 
------------
          1
          3
(2 rows)

Note that the value 3 was inserted after the error, but still inside the same transaction!

The documentation for SAVEPOINT is at http://www.postgresql.org/docs/8.4/static/sql-savepoint.html.

I would use a stored procedure to catch the exceptions on your unique violations. Example:

CREATE OR REPLACE FUNCTION my_insert(i_foo text, i_bar text)
  RETURNS boolean LANGUAGE plpgsql AS
$BODY$
begin   
    insert into foo(x, y) values(i_foo, i_bar);
    exception
        when unique_violation THEN -- nothing

    return true;
end;
$BODY$;

SELECT my_insert('value 1','another value');

You can do a rollback to the transaction or a rollback to a save point just before the code that raises the exception (cr is the cursor):

name = uuid.uuid1().hex
cr.execute('SAVEPOINT "%s"' % name)
try:
    # your failing query goes here
except Exception:
    cr.execute('ROLLBACK TO SAVEPOINT "%s"' % name)
    # your alternative code goes here 
else:
    cr.execute('RELEASE SAVEPOINT "%s"' % name)

This code assumes there is running transaction, otherwise you would not receive that error message.

Django postgresql backend creates cursors directly from psycopg. Maybe in the future they make a proxy class for the Django cursor, similar to the cursor of odoo. They extend the cursor with the following code (self is the cursor):

@contextmanager
@check
def savepoint(self):
    """context manager entering in a new savepoint"""
    name = uuid.uuid1().hex
    self.execute('SAVEPOINT "%s"' % name)
    try:
        yield
    except Exception:
        self.execute('ROLLBACK TO SAVEPOINT "%s"' % name)
        raise
    else:
        self.execute('RELEASE SAVEPOINT "%s"' % name)

That way the context makes your code easier, it will be:

try:
    with cr.savepoint():
        # your failing query goes here
except Exception:
    # your alternative code goes here

and the code is more readable, because the transaction stuff is not there.

Or you can use SSIS and have the failed rows take a differnt path than the successful ones.

SInce you are usinga differnt database can you bulk insert the files to a staging table and then use SQL code to select only those records which do not have an exisitng id?

继续阅读：constraints database postgresql sql transactions

Continuing a transaction after primary key violation error

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？