开发者

faster way for moving large amounts data in oracle to external database

I have a java program that compares and moves data from a local Oracle database to an external MS SQL data base hosted by an online vendor.

Currently the program connects to the MS SQL database doing a SELECT * FROM myTABLE and putting the results in a two-dimensional array. A temp table is then created in Oracle and the data is inserted into the temp table using a for loop:

for (int ii = 0; ii < arr.length(); ii++){
    query = "insert into myTable_temp values "+values_from_array;
    stmt.executeQuery();
} 

A MINUS statement is used find the differences:

query = "select * from ora_table minus select * from myTable_temp";
rs = stmt.executeQuery();

and the result is then inserted back into the MS SQL table using a insert_into_mssql() method:

while (rs.next()){
    query = "insert into myTable values "+ values_from_rs
    insert_into_mssql(query);
}

This works fine except a table I am us开发者_高级运维ing has 220,000 rows. So it takes a long time to load the array, insert into temp table, run the minus statement, then insert everything back into the mssql table.

Is there a more efficient way to do this? or a way to make this faster? (a dblink or any direct connection wont work bc of security restrictions)


Yes, What version of MS SQL Server are you using? Depending on that, you can look into SSIS or other DTS software. It allows you to do excactly what you want very quickly.


I think it is a good idea to create a dblink.


Maybe it may help you.

The new addBatch( ) method of Statement allows you to lump multiple update statements as a unit and execute them at once. Call addBatch( ) after you create the statement, and before execution:

con.setAutoCommit(false); // If some fail, we want to rollback the rest
Statement stmt = con.createStatement(  );

stmt.addBatch(
"INSERT INTO CUSTOMERS VALUES (1, "J Smith", "617 555-1323");
stmt.addBatch(
"INSERT INTO CUSTOMERS VALUES (2, "A Smith", "617 555-1132");
stmt.addBatch(
"INSERT INTO CUSTOMERS VALUES (3, "C Smith", "617 555-1238");
stmt.addBatch(
"INSERT INTO CUSTOMERS VALUES (4, "K Smith", "617 555-7823");

int[] upCounts = stmt.executeBatch(  );
con.commit(  );

Notice that we turn transaction auto-commit off before creating the batch.


I would maybe start by not loading it into an array and just using a similar technique as is used in your writing out. Could also look at parallelizing the inserts.

If I was doing this from scratch, I would bring all the Oracle data to SQL Server and do everything there (copy table from Oracle, find differences and insert into SQL Server, drop the table copied from Oracle) instead of bringing all the SQL Server data to Oracle, dumping just the new records and bringing them over to SQL Server and inserting them.

SQL Server also has a pretty good ETL tool - SQL Server Integration Services, but I haven't had a lot of success using it with things besides SQL Server data sources and destinations.


Taking into account technological limitations (no dblinks and so on) you can try to lower transport overhead. This can be done by maintaining synchronization state and transferring only updated rows.

If I understand correctly, you need only transfer updated and new rows from Oracle database to MS SQL. In very generic variant you need to add 2 columns into oracle_table which holds replication state:

last_update_time - maintained by trigger on insert or update for each row. Trigger set column value to current date and time on each row change and for new row.

last_sync_time - maintained by trigger on insert and changed by sync application.

For new row last_update_time := sysdate and last_sync_time := last_update_time - 1.

On each oracle_table row update last_update_time updated by current date time.

On each synchronization session next sequence performed:

  1. Select all rows from oracle_table where last_sync_time < last_update_time and insert it to some buffer table on Oracle server:

    begin
      execute immediate 'truncate table buffer_table';
    
      insert into buffer_table(...)
      select * from oracle_table where last_sync_time < last_update_time;
    
    end;
    
  2. Transfer all rows from buffer_table to MS SQL server and update myTable accordingly.

  3. Advance update time for all synchronized rows to new value:

    update 
      oracle_table o_table
    set 
      last_sync_time = ( select buf_table.last_update_time 
                         from buffer_table buf_table
                         where buf_table.Primary_Key = o_table.Primary_Key
                       )
    where
      o_table.Primary_Key in (select Primary_Key from buffer_table)
    

Note that synchronized rows may be changed during synchronization process and we need timestamp value from buffer_table while updating last_sync_time.

P.S. I supposed that database on MS SQL side is read-only. Two-way synchronization can be done on the same principle with slightly algorithm improvements.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜