开发者

Updating large record sets into a normalized Access database

I'm using Access 2007 to build a normalized database to replace one that uses a couple of flat many-field tables. My problem is that I often get Excel sheets with lots of updates that I import as tables and then join against an existing table to do updates. But that's going to be more difficult now that I'm normalizing. Here's an example of the VBA code for updating a value:

function updateBoxCategory(boxID As String, newCategory As String) as long

Dim boxKey As Long
Dim catKey As Long
Dim db As Database
Dim ustr As String

Set db = CurrentDb


boxKey = getKey(db, "boxes", "boxID", boxID)

'exit if box not found'
If boxKey = 0 Then
 Exit Sub
End If

catKey = getKey(db, "categories", "category", newCategory)

'exit if category not found'
If catKey = 0 Then
 Exit Sub
End If

ustr = "update boxes set catKey=" & catKey & " where ID=" & boxKey
db.Execute ustr, dbFailOnError

End Sub

getKey("dbObject","table","field","value") returns a unique value's primary key.

I'm concerned that if, e.g., I have to update 100,000 records, I'll have to loop this procedure's queries through every record, which means I'll be running 100,000 select queries against a table with 100,000 records--and this makes me worried about performance issues, even though everything's indexed.

Basically, my qu开发者_开发百科estion is this: is this code the appropriate way to handle updates to normalized data?


In SQL we shun such procedural code in favour of set-based solutions. The ides is that you tell the optimizer in a single SQL statement what you want to achieve and it (rather than you) decides how best to do it.

I assume you have a staging table (could be in Excel, could be a linked table) with columns for the real keys, boxID and newCategory. However, you cannot use these values in your Boxes table because there is some indirection in the schema design: you need to find the 'surrogate' key values using a look up tables (I urge you to consider fixing this "feature" in your design so you can just use the real key values :)

Here's how it can be done with Standard SQL:2003 (e.g. works on SQL Server 2008):

MERGE INTO Boxes
   USING (
          SELECT B1.ID AS boxKey, C1.ID AS catKey
            FROM YourStagingTable AS S1
                 INNER JOIN Boxes AS B1
                    ON B1.boxID = S1.boxID
                 INNER JOIN Categories AS C1
                    ON C1.category = S1.NewCategory
         ) AS source (
                      boxKey, catKey
                     )
      ON Boxes.ID = source.boxKey
WHEN MATCHED THEN
   UPDATE
      SET catKey = source.catKey;

Here is the equivalent in SQL-92 Standard, which required the use of scalar subqueries:

UPDATE Boxes
   SET catKey = 
                (
                 SELECT C1.ID AS catKey
                   FROM YourStagingTable AS S1
                         INNER JOIN Boxes AS B1
                            ON B1.boxID = S1.boxID
                         INNER JOIN Categories AS C1
                            ON C1.category = S1.NewCategory
                  WHERE Boxes.ID = B1.ID
                ) 
 WHERE EXISTS (
               SELECT * 
                 FROM YourStagingTable AS S1
                       INNER JOIN Boxes AS B1
                          ON B1.boxID = S1.boxID
                WHERE Boxes.ID = B1.ID
              );

Sadly, Access (Jet, ACE, whatever) does not support any of the modern SQL Standards even at entry level (if something from 1992 can indeed be considered 'modern' :) Rather, Access insists you use its propriety UPDATE..JOIN syntax, with which I've never been really familiar. Hopefully the above will point you in the right direction for Access (or erhaps someone can edit this answer to add the equivalent Access dialect...?)


I'm not entirely sure what you're doing here, but if you're trying to match up a row in a table with a row in a spreadsheet and copy values that differ from the spreadsheet to the table, you need to rotate your approach 90 degrees.

That is, instead of running a SQL UPDATE for each ROW, run one for each COLUMN.

Here's code that does this. It assumes that the two tables being compared have a shared primary key and that they have the same field names (though you can write a query that aliases the fields in one to match the names in the other):

  Public Function UpdateTableData(ByVal strSourceTable As String, _
       ByVal strTargetTable As String, ByVal strJoinField As String, _
       ByRef db As DAO.Database, Optional ByVal strExcludeFieldsList As String, _
       Optional ByVal strUpdatedBy As String = "Auto Update", _
       Optional strAdditionalCriteria As String) As Boolean
    Dim strUpdate As String
    Dim rsFields As DAO.Recordset
    Dim fld As DAO.Field
    Dim strFieldName As String
    Dim strNZValue As String
    Dim strSet As String
    Dim strWhere As String
    Dim STR_QUOTE = """"

    strUpdate = "UPDATE " & strTargetTable & " INNER JOIN " & strSourceTable _
       & " ON " & strTargetTable & "." & strJoinField & " = " _
       & strSourceTable & "." & strJoinField
    ' if the fields don't have the same names in both tables,
    '   create a query that aliases the fields to have the names of the
    '   target table
    ' if the source table is in a different database and you don't
    '   want to create a linked table, create a query and specify
    '   the external database as the source of the table
    ' alternatively, for strTargetTable, supply a SQL string with
    '   the external connect string
    Set rsFields = db.OpenRecordset(strSourceTable)
    For Each fld In rsFields.Fields
      strFieldName = fld.Name
      If strFieldName <> strJoinField Or (InStr(", " & strExcludeFieldsList _
           & ",", strFieldName & ",") <> 0) Then
         Select Case fld.Type
           Case dbText, dbMemo
             strNZValue = "''"
           Case Else
             strNZValue = "0"
         End Select
         strSet = " SET " & strTargetTable & "." & strFieldName _
           & " = varZLSToNull(" & strSourceTable & "." & strFieldName & ")"
         strSet = strSet & ", " & strTargetTable & ".Updated = #" & Date & "#"
         strSet = strSet & ", " & strTargetTable & ".UpdatedBy = " _
           & STR_QUOTE & strUpdatedBy & STR_QUOTE
         strWhere = " WHERE Nz(" & strTargetTable & "." & strFieldName _
           & ", " & strNZValue & ") <> Nz(" & strSourceTable & "." _
           & strFieldName & ", " & strNZValue & ")"
         If db.TableDefs(strTargetTable).Fields(fld.Name).Required Then
            strWhere = strWhere & " AND " & strSourceTable & "." _
              & strFieldName & " Is Not Null"
         End If
         If Len(strAdditionalCriteria) > 0 Then
            strWhere = strWhere & " AND " & strAdditionalCriteria
         End If
         Debug.Print strUpdate & strSet & strWhere
         Debug.Print SQLRun(strUpdate & strSet & strWhere, dbLocal) & " " _
           & strFieldName & " updated."
      End If
    Next fld
    Debug.Print dbLocal.OpenRecordset("SELECT COUNT(*) FROM " _
      & strTargetTable & " WHERE Updated=#" & Date _
      & "# AND UpdatedBy=" & STR_QUOTE & strUpdatedBy & STR_QUOTE)(0) _
      & " total records updated in " & strTargetTable
    rsFields.Close
    Set rsFields = Nothing
    UpdateTableData = True
  End Function

I've been using variations on that code for well over a decade, and it's much faster and more efficient than doing it row-by-row.

Do note that there are some assumptions hardwired into it (like the fact that each table has Updated and UpdatedBy fields). But it should get you started.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜