Updating large record sets into a normalized Access database
I'm using Access 2007 to build a normalized database to replace one that uses a couple of flat many-field tables. My problem is that I often get Excel sheets with lots of updates that I import as tables and then join against an existing table to do updates. But that's going to be more difficult now that I'm normalizing. Here's an example of the VBA code for updating a value:
function updateBoxCategory(boxID As String, newCategory As String) as long
Dim boxKey As Long
Dim catKey As Long
Dim db As Database
Dim ustr As String
Set db = CurrentDb
boxKey = getKey(db, "boxes", "boxID", boxID)
'exit if box not found'
If boxKey = 0 Then
Exit Sub
End If
catKey = getKey(db, "categories", "category", newCategory)
'exit if category not found'
If catKey = 0 Then
Exit Sub
End If
ustr = "update boxes set catKey=" & catKey & " where ID=" & boxKey
db.Execute ustr, dbFailOnError
End Sub
getKey("dbObject","table","field","value") returns a unique value's primary key.
I'm concerned that if, e.g., I have to update 100,000 records, I'll have to loop this procedure's queries through every record, which means I'll be running 100,000 select queries against a table with 100,000 records--and this makes me worried about performance issues, even though everything's indexed.
Basically, my qu开发者_开发百科estion is this: is this code the appropriate way to handle updates to normalized data?
In SQL we shun such procedural code in favour of set-based solutions. The ides is that you tell the optimizer in a single SQL statement what you want to achieve and it (rather than you) decides how best to do it.
I assume you have a staging table (could be in Excel, could be a linked table) with columns for the real keys, boxID
and newCategory
. However, you cannot use these values in your Boxes
table because there is some indirection in the schema design: you need to find the 'surrogate' key values using a look up tables (I urge you to consider fixing this "feature" in your design so you can just use the real key values :)
Here's how it can be done with Standard SQL:2003 (e.g. works on SQL Server 2008):
MERGE INTO Boxes
USING (
SELECT B1.ID AS boxKey, C1.ID AS catKey
FROM YourStagingTable AS S1
INNER JOIN Boxes AS B1
ON B1.boxID = S1.boxID
INNER JOIN Categories AS C1
ON C1.category = S1.NewCategory
) AS source (
boxKey, catKey
)
ON Boxes.ID = source.boxKey
WHEN MATCHED THEN
UPDATE
SET catKey = source.catKey;
Here is the equivalent in SQL-92 Standard, which required the use of scalar subqueries:
UPDATE Boxes
SET catKey =
(
SELECT C1.ID AS catKey
FROM YourStagingTable AS S1
INNER JOIN Boxes AS B1
ON B1.boxID = S1.boxID
INNER JOIN Categories AS C1
ON C1.category = S1.NewCategory
WHERE Boxes.ID = B1.ID
)
WHERE EXISTS (
SELECT *
FROM YourStagingTable AS S1
INNER JOIN Boxes AS B1
ON B1.boxID = S1.boxID
WHERE Boxes.ID = B1.ID
);
Sadly, Access (Jet, ACE, whatever) does not support any of the modern SQL Standards even at entry level (if something from 1992 can indeed be considered 'modern' :) Rather, Access insists you use its propriety UPDATE..JOIN
syntax, with which I've never been really familiar. Hopefully the above will point you in the right direction for Access (or erhaps someone can edit this answer to add the equivalent Access dialect...?)
I'm not entirely sure what you're doing here, but if you're trying to match up a row in a table with a row in a spreadsheet and copy values that differ from the spreadsheet to the table, you need to rotate your approach 90 degrees.
That is, instead of running a SQL UPDATE for each ROW, run one for each COLUMN.
Here's code that does this. It assumes that the two tables being compared have a shared primary key and that they have the same field names (though you can write a query that aliases the fields in one to match the names in the other):
Public Function UpdateTableData(ByVal strSourceTable As String, _
ByVal strTargetTable As String, ByVal strJoinField As String, _
ByRef db As DAO.Database, Optional ByVal strExcludeFieldsList As String, _
Optional ByVal strUpdatedBy As String = "Auto Update", _
Optional strAdditionalCriteria As String) As Boolean
Dim strUpdate As String
Dim rsFields As DAO.Recordset
Dim fld As DAO.Field
Dim strFieldName As String
Dim strNZValue As String
Dim strSet As String
Dim strWhere As String
Dim STR_QUOTE = """"
strUpdate = "UPDATE " & strTargetTable & " INNER JOIN " & strSourceTable _
& " ON " & strTargetTable & "." & strJoinField & " = " _
& strSourceTable & "." & strJoinField
' if the fields don't have the same names in both tables,
' create a query that aliases the fields to have the names of the
' target table
' if the source table is in a different database and you don't
' want to create a linked table, create a query and specify
' the external database as the source of the table
' alternatively, for strTargetTable, supply a SQL string with
' the external connect string
Set rsFields = db.OpenRecordset(strSourceTable)
For Each fld In rsFields.Fields
strFieldName = fld.Name
If strFieldName <> strJoinField Or (InStr(", " & strExcludeFieldsList _
& ",", strFieldName & ",") <> 0) Then
Select Case fld.Type
Case dbText, dbMemo
strNZValue = "''"
Case Else
strNZValue = "0"
End Select
strSet = " SET " & strTargetTable & "." & strFieldName _
& " = varZLSToNull(" & strSourceTable & "." & strFieldName & ")"
strSet = strSet & ", " & strTargetTable & ".Updated = #" & Date & "#"
strSet = strSet & ", " & strTargetTable & ".UpdatedBy = " _
& STR_QUOTE & strUpdatedBy & STR_QUOTE
strWhere = " WHERE Nz(" & strTargetTable & "." & strFieldName _
& ", " & strNZValue & ") <> Nz(" & strSourceTable & "." _
& strFieldName & ", " & strNZValue & ")"
If db.TableDefs(strTargetTable).Fields(fld.Name).Required Then
strWhere = strWhere & " AND " & strSourceTable & "." _
& strFieldName & " Is Not Null"
End If
If Len(strAdditionalCriteria) > 0 Then
strWhere = strWhere & " AND " & strAdditionalCriteria
End If
Debug.Print strUpdate & strSet & strWhere
Debug.Print SQLRun(strUpdate & strSet & strWhere, dbLocal) & " " _
& strFieldName & " updated."
End If
Next fld
Debug.Print dbLocal.OpenRecordset("SELECT COUNT(*) FROM " _
& strTargetTable & " WHERE Updated=#" & Date _
& "# AND UpdatedBy=" & STR_QUOTE & strUpdatedBy & STR_QUOTE)(0) _
& " total records updated in " & strTargetTable
rsFields.Close
Set rsFields = Nothing
UpdateTableData = True
End Function
I've been using variations on that code for well over a decade, and it's much faster and more efficient than doing it row-by-row.
Do note that there are some assumptions hardwired into it (like the fact that each table has Updated and UpdatedBy fields). But it should get you started.
精彩评论