How to optimize this huge and ugly query, ignoring duplicates?
I'm not an SQL expert, here is my SQLite query on table "Query" (key: SystemId, TopicId, DocumentId, all also foreign keys) which gets the foreign keys and insert avoiding duplicates. But it is huge, ugly and I have to execute it thousands of times:
command.CommandText = "INSERT INTO Query (SystemId, TopicId, DocumentId) " +
"(SELECT Id FROM System WHERE Tag = @SystemTag COLLATE NOCASE), " +
"(SELECT Id FROM Topic WHERE Number = @TopicNumber COLLATE NOCASE), " +
"(SELECT Id FROM Document WHERE Number = @DocNumber COLLATE NOCASE) " +
"WHERE NOT EXISTS (SELECT 1 FROM Query WHERE " +
"SystemId = (SELECT Id FROM System WHERE Tag = @SystemTag) AND " +
"TopicId = (SELECT 1 FROM Topic WHERE Number = @TopicNumber) AND " +
"DocumentId = (SELECT Id FROM Document WHERE Number = @DocNumber))";
Question: Any way to tell sql "don't worry about duplicates, ignore the insert statement". Or maybe using variables/temporary tables, AD statements?
EDIT: Straight query:
INSERT INTO Query (SystemId, TopicId, DocumentId)
(SELECT Id FROM System WHERE Tag = @SystemTag COLLATE NOCASE),
(SELECT Id FROM Topic WHERE Number = @TopicNumber COLLATE NOCASE),
(SELECT Id FROM Document WHERE Number = @DocNumber COLLATE NOCASE)
WHERE NOT EXISTS (SELECT 1 FROM Query WHERE
SystemId = (SELECT Id FROM System WHERE Tag = @开发者_JAVA技巧SystemTag) AND
TopicId = (SELECT 1 FROM Topic WHERE Number = @TopicNumber) AND
DocumentId = (SELECT Id FROM Document WHERE Number = @DocNumber));
To prevent inserting a duplicate, you need two things:
A table definition that identifies the columns as needing to be unique. For example:
CREATE TABLE Query (
SystemId INTEGER,
TopicId INTEGER,
DocumentId INTEGER,
PRIMARY KEY (SystemId, TopicId, DocumentId));
or
CREATE TABLE Query (
SystemId INTEGER,
TopicId INTEGER,
DocumentId INTEGER,
PRIMARY KEY (SystemId, TopicId, DocumentId));
And a conflict clause. You can do this in one of two ways, either in your table definition (leaving it like above will make it default to IGNORE, which is pretty much what you want), or in your insert command:
INSERT OR IGNORE INTO Query...
If you have you table setup with the UNIQUE constraint, you really don't need to change your INSERT query (besides removing the admittedly ugly WHERE NOT EXISTS
bit.
The drawback is that yes, it make your code attempt all sorts of insertions and fail. But look at it the other way: it makes your database behave the way you want it to behave. And that is key in working with databases, you don't want to have to do a full manual scan of all tables when you're doing an operation. You want to let the database do the dirty work.
In your question about ignoring duplicates, you need to investigate DISTINCT
I don't know how you do this in SQL Lite (if there are any platform specific syntax problems that is), but the common way of removing duplicates is using GROUP BY.
W3Schools have a pretty good example in generic SQL:
http://www.w3schools.com/sql/sql_groupby.asp
The other option is DISTINCT, but that can have problems in large data sets.
Finally, as an observation you might want to look into using JOIN rather than nested SELECTs.
http://www.w3schools.com/sql/sql_join.asp
精彩评论