开发者

Are bad data issues that common? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance. Closed 9 years ago.

I've worked for clients that had a large number of distinct, small to mid-sized projects, each interacting with each other via properly defined interfaces to share data, but not reading and writing to the same database. Each had their own separate database, their own cache, their o开发者_如何学Pythonwn file servers/system that they had dedicated access to, and so they never caused any problems. One of these clients is a mobile content vendor, so they're lucky in a way that they do not have to face the same problems that everyday business applications do. They can create all those separate compartments where their components happily live in isolation of the others.

However, for many business applications, this is not possible. I've worked with a few clients, one of whose applications I am doing the production support for, where there are "bad data issues" on an hourly basis. Yeah, it's that crazy. Some data records from one of the instances (lower than production, of course) would have been run a couple of weeks ago, and caused some other user's data to get corrupted. And then, a data script will have to be written to fix this issue. And I've seen this happening so much with this client that I have to ask.

I've seen this happening at a moderate rate with other clients, but this one just seems to be out of order.

If you're working with business applications that share a large amount of data by reading and writing to/from the same database, are "bad data issues" that common in your environment?


Bad data issues occur all the time. The only reasonably effective defense is a properly designed, normalized database, preferrably interacting with the outside world only through stored procedures.


This is why it is important to put the required data rules at the database level and not the application. (Of course, it seems that many systems don't bother at the application level either.)

It also seems that a lot of people who design data imports, don't bother to clean the data before putting it in their system. Of course it's hard to find all the possible ways to mess up the data, I've done imports for years and I still get surprised sometimes. My favorite was the company where their data entry people obviously didn't care about the field names and the application just went to the next field when the first field was fully. I got names like: "McDonald, Ja" in the last name field and "mes" in the first name field.

I do data imports from many, many clients and vendors. Out of hundreds of different imports I've developed, I can think of only one or two where the data was clean. For some reason the email field seems to be particularly bad and is often used for notes instead of emails. It's really hard to send an email to "His secretary is the hot blonde."


Yes, very common. Getting the customer to understand the extent of the problem is another matter. At one customer I had to resort to writing an application which analyzed their database and beeped every time it enountered a record which didn't match their own published data format. I took the laptop with their DB installed to a meeting and ran the program, then watched all the heads at the table swivel around to stare at their DBA while my machine beeped crazily in the background. There's nothing quite like grinding the customer's nose in his own problems to gain attention.


I don't think you are talking about bad data (but it would only be polite of you to answer the various questions raised in comments) but invalid data. For example, '9A!' stored in a field that is supposed to contains a 3-character ISO ccurrency code is probably invalid data, and should have been caught at data entry time. Bad is data usually taken to be equivalent to corruption caused by disk errors etc. The former are quite common, depending on the quality of the data input applications, while the latter are pretty rare.


I assume that by "bad data issues" you mean "issues of data that does not satisfy all applicable business constraints".

They can only be a consequence of two things : bad database design by the database designer (that is : either unintentional or -even worse- intentional omission of integrity constraints in the database definition), or else the inability of the DBMS to support the more complex types of database constraint, combined with a flawed program written by the programmer to enforce the dbms-unsupported integrity constraint.

Given how poor SQL databases are at integrity constraints, and given the poor level of knowledge of data management among the average "modern programmer", yes such issues are everywhere.


If the data get's corrupted because users shut down their application in the middle of complex database updates then transactions are your friends. This way you don't get entry in Invoice table, but no entries in InvoiceItems table. Unless Commited at the end of the process, all made changes are rolled back,

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜