MS Access 2010: Deleting duplicates without primary key
I am working for a client who gets data in Excel spreadsheets but wants to import the data into an Access table. The fields for the data records are:
-InvoiceNum -InvoiceDate -Customer -ShipDate -Quantity -Item -PriceEach
He receives data twice per month, and each time he receives data, he wants to be able to import the data into a table in Access.
There are two issues that are causing me a problem: 1) There is no primary key for the data (the closest field to a primary key is "InvoiceNum", but unfortunately multiple records can have the same string for this field); 2) Duplicate records are possible, where by "duplicate records" I mean two records that have the same values for each field.
The problem is that we do not want duplicate records in the data table.
I don't know what is the best way to handle this. I am hoping for some suggestions concerning the following:
a) Should I store all the records in an Excel spreadsheet that is linked to the Access table? I was thinking that if I do this, then I can append each new set of data to this spreadsheet (including duplicates), then write a macro in Excel to remove duplicates (I noticed I can do this by using the "Remove Duplicates" command on the "Data" tab).
or
b) Should I store the data directly 开发者_运维百科in the Access table? I can write some VBA program or a macro to import each new set of Excel data to the Access table, but is there a way to do this importing that can eliminate duplicates (again, there is no primary key in the table)?
or
c) Is there another option that is better than the other two above?
Thanks for any help with this! I really appreciate it!
If you can't do what @Catcall suggests (i.e., fix the process that produces the dupes), I'd do it this way:
create a staging table in Access. It's function is just to be a buffer table for each import, and is cleared after use.
import into it using the method @HansUp provided (i.e., a SQL string with the connect string in the IN clause).
then use a query with a LEFT JOIN to the existing invoices to find the ones that are new:
SELECT tblBuffer.InvoiceNum, tblBuffer.InvoiceDate, tblBuffer.Customer, tblBuffer.ShipDate, tblBuffer.Quantity, tblBuffer.Item, tblBuffer.PriceEach FROM tblBuffer LEFT JOIN tblInvoices ON tblBuffer.InvoiceNum = tblInvoices.InvoiceNum WHERE tblInvoices.InvoiceNum Is Null
That will give you the new invoices, and you can easily turn that into an INSERT command to insert those records:
INSERT INTO tblInvoices (InvoiceNum, InvoiceDate, Customer, ShipDate, Quantity, Item, PriceEach) SELECT tblBuffer.InvoiceNum, tblBuffer.InvoiceDate, tblBuffer.Customer, tblBuffer.ShipDate, tblBuffer.Quantity, tblBuffer.Item, tblBuffer.PriceEach FROM tblBuffer LEFT JOIN tblInvoices ON tblBuffer.InvoiceNum = tblInvoices.InvoiceNum WHERE tblInvoices.InvoiceNum Is Null
- Now, it does occur to me given the field names that the reason there many be duplicate invoices is because this is denormalized data, and the cases where there's more than one record are actually any invoice with more than one invoice item. In that case, you may need to create an invoice header table and then insert the invoice items into an invoice details table. I'll leave that as an exercise to the reader, since it's too much work to mock it up in the abstract when it may not even matter.
Fix it at the root.
The root cause of the problem is whatever person or software is creating the Excel spreadsheet with the duplicate rows. The best thing you can do is eliminate the duplicates before the data gets into Excel.
If you can't do that, then remove the duplicates in Excel before you import it into Access. (You don't have to write a macro for that.) Since you'll then have no duplicates, you will be able to establish a key for the target table. Best case, your key is InvoiceNum. In the worst case, the key will be {InvoiceNum, InvoiceDate, Customer, ShipDate, Quantity, Item, PriceEach}.
All this assumes that the duplicates are meaningless. If they're (supposed to be) meaningful, then you need more columns. I can't imagine how that can happen, though.
精彩评论