What is the fastest way to insert 100 000 records from one database to another?
I've a mobile application. My client has a large data set ~100.000 records. It's updated frequently. When we sync we need to copy from one database to another.
I've attached the second database to the main, and run an insert into table select * from sync.table
.
This is extremely slow, it takes about 10 minutes I think. I noticed that the journal file gets increased step by step.
How can I speed this up?
EDITED 1
I have indexes off, and I have journal off. Using
insert into table select * from sync.table
it still takes 10 minutes.
EDITED 2
If I run a query like
select id,invitem,invid,cost from inventory where itemtype = 1
order by invitem limit 50
it takes 15-20 seconds.
The table schema is:
CREATE TABLE inventory
('id' INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
'serverid' INTEGER NOT NULL DEFAULT 0,
'itemtype' INTEGER NOT NULL DEFAULT 0,
'invitem' VARCHAR,
'ins开发者_StackOverflow中文版tock' FLOAT NOT NULL DEFAULT 0,
'cost' FLOAT NOT NULL DEFAULT 0,
'invid' VARCHAR,
'categoryid' INTEGER DEFAULT 0,
'pdacategoryid' INTEGER DEFAULT 0,
'notes' VARCHAR,
'threshold' INTEGER NOT NULL DEFAULT 0,
'ordered' INTEGER NOT NULL DEFAULT 0,
'supplier' VARCHAR,
'markup' FLOAT NOT NULL DEFAULT 0,
'taxfree' INTEGER NOT NULL DEFAULT 0,
'dirty' INTEGER NOT NULL DEFAULT 1,
'username' VARCHAR,
'version' INTEGER NOT NULL DEFAULT 15
)
Indexes are created like
CREATE INDEX idx_inventory_categoryid ON inventory (pdacategoryid);
CREATE INDEX idx_inventory_invitem ON inventory (invitem);
CREATE INDEX idx_inventory_itemtype ON inventory (itemtype);
I am wondering, the insert into ... select * from isn't the fastest built-in way to do massive data copy?
EDITED 3
SQLite is server-less, so please stop voting a particular answer, because that is not the answer I'm sure.
If the target is some version of MS SQL Server, SqlBulkCopy offers an efficient insert for large data sets this is similar to the command bcp
.
You can also disable/remove the non-clustered indexes before inserting, and re-create them after.
In SQLite, these are usually pretty fast:
.dump ?TABLE? ... Dump the database in an SQL text format
.import FILE TABLE Import data from FILE into TABLE
Also try: PRAGMA journal_mode = OFF
FYI, you should be able to run the command line utility on Windows Mobile if you include it in your package.
I don't think that attaching the two databases and running INSERT INTO foo (SELECT * FROM bar)
is the fastest way to do this. If you are synching between a handheld device and a server (or another device) could the transport mechanism be the bottleneck? Or are the two database files already on the same filesysem? If the filesystem on the device is slower flash-memory, could this be a bottleneck?
Are you able to compile/run the raw SQLite C code on your device? (I think that the RAW sqlite3 amalgamation should compile for WinCE/Mobile) If so, and you are willing:
- To write some C code (using the SQLite C API)
- Increase risk of data loss by turning off disk journaling
It should be possible for to write a small stand-alone executable to copy/synchronize the 100K records between the two databases extremely quickly.
I've posted some of what I learned about optimizing SQLite inserts here: Improve INSERT-per-second performance of SQLite?
Edit: Tried this out with real code...
I don't know all the steps involved in building a Windows Mobile executable, but the SQLite3 amalgamation should compile out-of-the box using Visual Studio. Here is a sample main.c
program that opens two SQLite databases (both have to have the same schema - see the #define TABLE
statement) and executes a SELECT statement and then binds the resulting rows to an INSERT statement:
/*************************************************************
** The author disclaims copyright to this source code. In place of
** a legal notice, here is a blessing:
**
** May you do good and not evil.
** May you find forgiveness for yourself and forgive others.
** May you share freely, never taking more than you give.
**************************************************************/
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <string.h>
#include "sqlite3.h"
#define SOURCEDB "C:\\source.sqlite"
#define DESTDB "c:\\dest.sqlite"
#define TABLE "CREATE TABLE IF NOT EXISTS TTC (id INTEGER PRIMARY KEY, Route_ID TEXT, Branch_Code TEXT, Version INTEGER, Stop INTEGER, Vehicle_Index INTEGER, Day Integer, Time TEXT)"
#define BUFFER_SIZE 256
int main(int argc, char **argv) {
sqlite3 * sourceDB;
sqlite3 * destDB;
sqlite3_stmt * insertStmt;
sqlite3_stmt * selectStmt;
char * insertTail = 0;
char * selectTail = 0;
int n = 0;
int result = 0;
char * sErrMsg = 0;
clock_t cStartClock;
char sInsertSQL [BUFFER_SIZE] = "\0";
char sSelectSQL [BUFFER_SIZE] = "\0";
/* Open the Source and Destination databases */
sqlite3_open(SOURCEDB, &sourceDB);
sqlite3_open(DESTDB, &destDB);
/* Risky - but improves performance */
sqlite3_exec(destDB, "PRAGMA synchronous = OFF", NULL, NULL, &sErrMsg);
sqlite3_exec(destDB, "PRAGMA journal_mode = MEMORY", NULL, NULL, &sErrMsg);
cStartClock = clock(); /* Keep track of how long this took*/
/* Prepared statements are much faster */
/* Compile the Insert statement */
sprintf(sInsertSQL, "INSERT INTO TTC VALUES (NULL, @RT, @BR, @VR, @ST, @VI, @DT, @TM)");
sqlite3_prepare_v2(destDB, sInsertSQL, BUFFER_SIZE, &insertStmt, &insertTail);
/* Compile the Select statement */
sprintf(sSelectSQL, "SELECT * FROM TTC LIMIT 100000");
sqlite3_prepare_v2(sourceDB, sSelectSQL, BUFFER_SIZE, &selectStmt, &selectTail);
/* Transaction on the destination database */
sqlite3_exec(destDB, "BEGIN TRANSACTION", NULL, NULL, &sErrMsg);
/* Execute the Select Statement. Step through the returned rows and bind
each value to the prepared insert statement. Obviously this is much simpler
if the columns in the select statement are in the same order as the columns
in the insert statement */
result = sqlite3_step(selectStmt);
while (result == SQLITE_ROW)
{
sqlite3_bind_text(insertStmt, 1, sqlite3_column_text(selectStmt, 1), -1, SQLITE_TRANSIENT); /* Get Route */
sqlite3_bind_text(insertStmt, 2, sqlite3_column_text(selectStmt, 2), -1, SQLITE_TRANSIENT); /* Get Branch */
sqlite3_bind_text(insertStmt, 3, sqlite3_column_text(selectStmt, 3), -1, SQLITE_TRANSIENT); /* Get Version */
sqlite3_bind_text(insertStmt, 4, sqlite3_column_text(selectStmt, 4), -1, SQLITE_TRANSIENT); /* Get Stop Number */
sqlite3_bind_text(insertStmt, 5, sqlite3_column_text(selectStmt, 5), -1, SQLITE_TRANSIENT); /* Get Vehicle */
sqlite3_bind_text(insertStmt, 6, sqlite3_column_text(selectStmt, 6), -1, SQLITE_TRANSIENT); /* Get Date */
sqlite3_bind_text(insertStmt, 7, sqlite3_column_text(selectStmt, 7), -1, SQLITE_TRANSIENT); /* Get Time */
sqlite3_step(insertStmt); /* Execute the SQL Insert Statement (Destination Database)*/
sqlite3_clear_bindings(insertStmt); /* Clear bindings */
sqlite3_reset(insertStmt); /* Reset VDBE */
n++;
/* Fetch next from from source database */
result = sqlite3_step(selectStmt);
}
sqlite3_exec(destDB, "END TRANSACTION", NULL, NULL, &sErrMsg);
printf("Transfered %d records in %4.2f seconds\n", n, (clock() - cStartClock) / (double)CLOCKS_PER_SEC);
sqlite3_finalize(selectStmt);
sqlite3_finalize(insertStmt);
/* Close both databases */
sqlite3_close(destDB);
sqlite3_close(sourceDB);
return 0;
}
On my Windows desktop machine this code copies 100k records from source.sqlite
to dest.sqlite
in 1.20 seconds. I don't know exactly what kind of performance you'll see on a mobile device with flash memory (but I am curious).
I'm mobile right now so I can't post a very detailed answer, but this might be worth reading:
http://sqlite.org/cvstrac/wiki?p=SpeedComparison
As you can see SQLite 3 performs INSERTs way faster when using indexes and / or transactions. Also, INSERTs FROM SELECTs doesn't seem to be the strong of SQLite.
INSERT INTO SELECT * from attached databases is the fastest available option in SQLite. A few things to look out into.
Transactions. Make sure the entire thing is inside a transaction. This is really critical. If it's only one SQL statement then it's not important, but you said the journal increases "step by step" which indicates it's more than one statement.
Triggers. Do you have triggers running? Those obviously could affect performance.
Constraints. Do you have unnecessary constraints? You can't disable them or remove/re-add them, so if they're necessary there isn't much you can do about them, but it's something to consider.
You already mentioned turning off indexes.
Do all 100 000 records change very often? Or is it a subset that changes?
If so, you should consider adding an updated_since_last_sync column which gets flagged when an update is made, so during the next sync you only copy the records that have actually changed. Once the records are copied over, you set the flag column back to zero.
Send only deltas. I.e. Send only diffs. I.e. Send only what's changed.
What about storing the sync.table database table within a separate file? That way you just need to make a copy of that file in order to sync. I bet that's way faster than syncing by SQL.
If you haven't already you need to wrap it in a transaction. Makes a significant speed difference.
精彩评论