Best practise when updating individual fields in a database record
I have created a Data Access Layer using .NET. Everywhere that I update a database record, I have used sql along the lines of
UPDATE Customer SET FirstName=:FirstName, LastName=:LastName, Address1=:Address1, Address2=:Address2,....etc
This means that every field in the record is updated, even though only one field may have been changed. A colleague has taken issue with this, saying that we should only update a field if it has changed, citing bandwidth as the issue - say we had 160 fields, then we pass the data for 160 fields. I guess I could save bandwidth between the web server and the database server if I checked whether a value had changed, and generated sql based purely on the values that were actually changed.
Between the web server and the client, I now need to pass both old and new values, so potentially I increase bandwidth there (but then ASP.NET does this already anyway, and I'm not sure we can switch that bit off so that's probably not an issue).
So what is best practise? Should I be worried about updating all fields in a database record? How would I go about updating only the fields that have changed?
Edit added 29 October: Does anyone know what N开发者_开发百科Hibernate does? Perhaps this is an argument to invest time learning how to do it that way.
Premature optimization is the root of all evil.
Is there justification for this, do you actually have a bandwidth problem?
In my experience, tracking changes can be lots of work (ie lots of code to maintain) depending on how it is implemented. There are some elegant ways but I think what you are doing is fine unless there is some evidence to support a change.
If you were planning to implement some custom form of change tracking I would certainly not 'chat' with the database about it. Make a copy of your object locally when you first pull it from the DB and compare against that.
Perhaps as well as bandwidth, you may also want to consider multi user contention. Depending on how your app is designed, the updating all fields approach may lead to problems.
If you have two users who simultaneously load record number 1. User A changes the LastName from "Smith" to "Smythe" and saves. User B then changes Address1 from "The Street" to "Any Street" and saves.
Under your approach, this second save will also revert the LastName back to "Smith" again (because that was what it was when User B loaded it).
Following this through to its logical conclusion, you will also need to do other checking upon saving (perhaps some sort of timestamp) to ensure that the values in the database are what you think they are.
Assuming you have a gigabit network connection between web server and database servers, then you'd have to be updating somwhere in the region of 100 megabytes of data per second for this to even get close to being a problem. That's around 8 terabytes per day. My guess is that you aren't updating this much data, so it's probably not a problem.
Network bandwidth continues to improve at all levels. Unless you legitimately have a bottleneck somewhere, don't worry about it. Even then, my hunch is that the gain is small enough that you'll spend more time and effort (and risk) trying to optimize your update than you will finding performance improvements in other places.
Well, you either do it the way you have done (which works, and passes every field every time), or spend time generating bespoke SQL for every field and then passing an SQL statement for every single field, or generating a huge bit of SQL on the fly containing just the updated fields (risk of SQL Injection if you make a mistake). You'd probably get better bandwidth savings by doing your updates inside parameterised stored procedures and only passing the field values across rather than the names.
Unless bandwidth is really a problem, I don't think you have a problem. And the SP route is a better option anyway.
Isn't one of the fields the PK of the table? Are you really updating the PK? What about foreign keys? You're updating those too?
The issue isn't bandwidth. At least isolate the non relational elements of the record and only update that set (excluding the PK and foreign keys).
I've found myself in the same situation. I'd written my own lightweight ORM/DAL to interface to a non-optimally designed database, i.e. tables with 200+ fields.
I found that there was indeed an overhead associated with attempting to update all fields for a row, rather than just those that had changed.
In my case I was able to resolve the problem by using intelligent 'setters' for the ORM properties which would record if an attempt was made to modify a particular field. This meant that when I came to commit the object back to the database, I could emit a SQL UPDATE statement which only referred to those fields marked as modified.
N.B. For SELECT statements, best practice dictates that you should explicitly reference the fields you want returned rather than using 'SELECT * FROM'. It is stated that this is for performance/bandwidth reasons, so the inverse would make sense for UPDATES.
精彩评论