Store multiple datasets in single table
I'm brainstorming an application to handle various data integrity checks. Each individual check could query a number of production tables, evaluate the results and report an error providing data relevant to the check. For example, one check would look for customers with a scheduled payment but no remaining balance; a different check might look for credit card transactions that have been authorized but not settled for more than 3 days. Two completely unrelated checks. The dataset from the first one would contain things like customer number, scheduled payment date, payoff date, etc. The second check would have transaction number, card type, last 4 digits of card, amount, etc.
I would like to store the result datasets in a common schema so I can query for any errors from Check A for a specific customer in the last 3 months. Or, how many times has Check B returned an error for distinct transactions. Other tables would also manage issue resolution and such. The only thing I've come up with so far is a table with ~20 columns, one column relating to the specific check, one for date/time, and the remaining would be some form of varchar capable of holding any type of data. There are any number of reasons why this makes me cringe, but performance 开发者_如何学Pythonranks pretty high up there. I'm hoping to avoid separate tables for each check, but combining that with a lookup table for secondary functionality may be the only way to go.
For the curious I'm trying to keep this in the Microsoft world (VB.NET and SQL Server), but I'm open to other ideas.
You can pull out those ~20 columns into one normalised "values" table. You could have a lookup table that marries which fields are expected for each check type.
To illustrate what I'm suggesting:
check_datasets
dataset_id
(PK)check_id
(This would be the ID of the check that failed)datetime
value_types
value_type_id
(PK)name
dataset_value_types
value_type_id
(FK, PK)dataset_id
(FK, PK)
dataset_values
dataset_id
(FK, PK)value_type_id
(FK, PK)value
(This column would represent the value of data)
----------
So to explain, in the above tables:
check_datasets
is the table that holds each failed check (I am assuming that you are only recording failed checks - otherwise, you'd probably want to add a pass/successful flag to this table).value_types
is a simple lookup table to provide names for value fields.dataset_value_types
is used define which checks have which types of value fields (this is the many-to-many relationship betweencheck_datasets
table andvalue_types
).dataset_values
is the table that holds the specific values for each check.
This design provides a normalised and easy way to query every value you want for each "dataset" that is returned.
The drawback to this design is that you still have mixed datatypes in the one field (value
).
If you wanted to separate datatypes you could subtype dataset_values
into seperate tables (ie: one table for integer values, one for varchars, LOB etc).
In the example above, this would mean creating as many sub-tables you want for datatypes (eg: dataset_values_int
, dataset_values_text
, etc). Each subtable need only have two columns - one holding the value and the second referencing the parent table.
(1) dataset_values
would become a parent table and possibly could look like this:
dataset_values
dataset_value_id
(PK)dataset_id
(FK)value_type_id
(FK)datatype
(Is this value an integer, varchar, etc. This column tells you which sub-table to check for the table)
(2) An example sub-table would be:
dataset_values_varchar
dataset_value_id
(PK,FK)value
(In this table this would be a varchar - but would vary for the other subtables.)
Hmmmm. Just brainstorming here. Do you really need to store the check data? Or just a reference to be able to look it up? How about a table to store the types of checks and then a related table that stores a chekc id, the type of check, that date of the check, and any other data you might need. Then another related table that stores the check id and the record id for the records which failed the check.
An alternative is how we store our audit table data, a varchar (max for the actual data) a column that says what column it belonged to (and in your case what table it belonged to might be a separate column) and then a column that relates it to the check that was performed.
If you really feel you must store the data in one record, this might be a place to store an xml document.
精彩评论