Design question: Filterable attributes, SQL
I have two tables in my database, Operation
and Equipment
. An operation requires zero or more attributes. However, there's some logic in how the attributes are attributed:
- Operation
Foo
requires equipmentA
andB
- Operation
Bar
requires no equipment - Operation
Baz
requires equipmentB
and eitherC
orD
- Operation
Quux
requires equipment (A
orB
) and (C
orD
)
What's the best way to represent this in SQL?
I'm sure people have d开发者_运维百科one this before, but I have no idea where to start.
(FWIW, my application is built with Python and Django.)
Update 1: There will be around a thousand Operation
rows and about thirty Equipment
rows. The information is coming in CSV form similar to the description above: Quux, (A & B) | (C & D)
Update 2: The level of conjunctions & disjunctions shouldn't be too deep. The Quux
example is probably the most complicated, though there appears to be a A | (D & E & F)
case.
Think about how you'd model the operations in OO design: the operations would be subclasss of a common superclass Operation
. Each subclass would have mandatory object members for the respective equipment required by that operation.
The way to model this with SQL is Class Table Inheritance. Create a common super-table:
CREATE TABLE Operation (
operation_id SERIAL PRIMARY KEY,
operation_type CHAR(1) NOT NULL,
UNIQUE KEY (operation_id, operation_type),
FOREIGN KEY (operation_type) REFERENCES OperationTypes(operation_type)
);
Then for each operation type, define a sub-table with a column for each required equipment type. For example, OperationFoo
has a column for each of equipA
and equipB
. Since they are both required, the columns are NOT NULL
. Constrain them to the correct types by creating a Class Table Inheritance super-table for equipment too.
CREATE TABLE OperationFoo (
operation_id INT PRIMARY KEY,
operation_type CHAR(1) NOT NULL CHECK (operation_type = 'F'),
equipA INT NOT NULL,
equipB INT NOT NULL,
FOREIGN KEY (operation_id, operation_type)
REFERENCES Operations(operation_d, operation_type),
FOREIGN KEY (equipA) REFERENCES EquipmentA(equip_id),
FOREIGN KEY (equipB) REFERENCES EquipmentB(equip_id)
);
Table OperationBar
requires no equipment, so it has no equip columns:
CREATE TABLE OperationBar (
operation_id INT PRIMARY KEY,
operation_type CHAR(1) NOT NULL CHECK (operation_type = 'B'),
FOREIGN KEY (operation_id, operation_type)
REFERENCES Operations(operation_d, operation_type)
);
Table OperationBaz has one required equipment equipA
, and then at least one of equipB
and equipC
must be NOT NULL
. Use a CHECK
constraint for this:
CREATE TABLE OperationBaz (
operation_id INT PRIMARY KEY,
operation_type CHAR(1) NOT NULL CHECK (operation_type = 'Z'),
equipA INT NOT NULL,
equipB INT,
equipC INT,
FOREIGN KEY (operation_id, operation_type)
REFERENCES Operations(operation_d, operation_type)
FOREIGN KEY (equipA) REFERENCES EquipmentA(equip_id),
FOREIGN KEY (equipB) REFERENCES EquipmentB(equip_id),
FOREIGN KEY (equipC) REFERENCES EquipmentC(equip_id),
CHECK (COALESCE(equipB, equipC) IS NOT NULL)
);
Likewise in table OperationQuux
you can use a CHECK
constraint to make sure at least one equipment resource of each pair is non-null:
CREATE TABLE OperationQuux (
operation_id INT PRIMARY KEY,
operation_type CHAR(1) NOT NULL CHECK (operation_type = 'Q'),
equipA INT,
equipB INT,
equipC INT,
equipD INT,
FOREIGN KEY (operation_id, operation_type)
REFERENCES Operations(operation_d, operation_type),
FOREIGN KEY (equipA) REFERENCES EquipmentA(equip_id),
FOREIGN KEY (equipB) REFERENCES EquipmentB(equip_id),
FOREIGN KEY (equipC) REFERENCES EquipmentC(equip_id),
FOREIGN KEY (equipD) REFERENCES EquipmentD(equip_id),
CHECK (COALESCE(equipA, equipB) IS NOT NULL AND COALESCE(equipC, equipD) IS NOT NULL)
);
This may seem like a lot of work. But you asked how to do it in SQL. The best way to do it in SQL is to use declarative constraints to model your business rules. Obviously, this requires that you create a new sub-table every time you create a new operation type. This is best when the operations and business rules never (or hardly ever) change. But this may not fit your project requirements. Most people say, "but I need a solution that doesn't require schema alterations."
Most developers probably don't do Class Table Inheritance. More commonly, they just use a one-to-many table structure like other people have mentioned, and implement the business rules solely in application code. That is, your application contains the code to insert only the equipment appropriate for each operation type.
The problem with relying on the app logic is that it can contain bugs and might insert data the doesn't satisfy the business rules. The advantage of Class Table Inheritance is that with well-designed constraints, the RDBMS enforces data integrity consistently. You have assurance that the database literally can't store incorrect data.
But this can also be limiting, for instance if your business rules change and you need to adjust the data. The common solution in this case is to write a script to dump all the data out, change your schema, and then reload the data in the form that is now allowed (Extract, Transform, and Load = ETL).
So you have to decide: do you want to code this in the app layer, or the database schema layer? There are legitimate reasons to use either strategy, but it's going to be complex either way.
Re your comment: You seem to be talking about storing expressions as strings in data fields. I recommend against doing that. The database is for storing data, not code. You can do some limited logic in constraints or triggers, but code belongs in your application.
If you have too many operations to model in separate tables, then model it in application code. Storing expressions in data columns and expecting SQL to use them for evaluating queries would be like designing an application around heavy use of eval()
.
I think you should have either a one-to-many or many-to-many relationship between Operation
and Equipment
, depending on whether there is one Equipment
entry per piece of equipment, or per equipment type.
I would advise against putting business logic into your database schema, as business logic is subject to change and you'd rather not have to change your schema in response.
Looks like you'll need to be able to group certain equipment together as either conjunction or disjunction and combine these groups together...
OperationEquipmentGroup
id int
operation_id int
is_conjuction bit
OperationEquipment
id int
operation_equipment_group_id int
equipment_id
You can add ordering columns if that is important and maybe another column to the group table to specify how groups are combined (only makes sense if ordered). But, by your examples, it looks like groups are only conjuncted together.
Since Operations can have one or more piece of equipment, you should use a linking table. Your schema would be like this:
Operation
- ID
- othercolumn
Equipment
- ID
- othercolumn
Operation_Equipment_Link
- OperationID
- EquipmentID
The two fields in the third table can be set up as a composite primary key, so you don't need a third field and can more easily keep duplicates out of the table.
In addition to Nicholai's suggestion I solved a similar problem as following:
Table Operation has an additional field "OperationType"
Table Equipment has an additional field "EquipmentType"
I have an additional table "DefaultOperationEquipmentType" specifying which EquipmentType needs to be include with each OperationType, e.g.
OperationType EquipmentType
==============.=============.
Foo_Type A_Type
Foo_Type B_Type
Baz_Type B_Type
Baz_Type C_Type
My application doesn't need complex conditions like (A or B) because in my business logic both alternative equipments belong to the same type of equipment, e.g. in a PC environment I could have an equipment Mouse (A) or Trackball (B), but they both belong to EquipmentType "PointingDevice_Type"
Hope that helps
Be Aware I have not tested this in the wild. That being said, the best* way I can see to do a mapping is with a denormalized table for the grouping.
*(aside from Bill's way, which is hard to set up, but masterful when done correctly)
Operations:
--------------------
Op_ID int not null pk
Op_Name varchar 500
Equipment:
--------------------
Eq_ID int not null pk
Eq_Name varchar 500
Total_Available int
Group:
--------------------
Group_ID int not null pk
-- Here you have a choice. You can either:
-- Not recommended
Equip varchar(500) --Stores a list of EQ_ID's {1, 3, 15}
-- Recommended
Eq_ID_1 bit
Eq_1_Total_Required
Eq_ID_2 bit
Eq_2_Total_Required
Eq_ID_3 bit
Eq_3_Total_Required
-- ... etc.
Operations_to_Group_Mapping:
--------------------
Group_ID int not null frk
Op_ID int not null frk
Thus, in case X: A | (D & E & F)
Operations:
--------------------
Op_ID Op_Name
1 X
Equipment:
--------------------
Eq_ID Eq_Name Total_Available
1 A 5
-- ... snip ...
22 D 15
23 E 0
24 F 2
Group:
--------------------
Group_ID Eq_ID_1 Eq_1_Total_Required -- ... etc. ...
1 TRUE 3
-- ... snip ...
2 FALSE 0
Operations_to_Group_Mapping:
--------------------
Group_ID Op_ID
1 1
2 1
As loathe as I am to put recursive (tree) structures in SQL, it sounds like this is really what you're looking for. I would use something modeled like this:
Operation
----------------
OperationID PK
RootEquipmentGroupID FK -> EquipmentGroup.EquipmentGroupID
...
Equipment
----------------
EquipmentID PK
...
EquipmentGroup
----------------
EquipmentGroupID PK
LogicalOperator
EquipmentGroupEquipment
----------------
EquipmentGroupID | (also FK -> EquipmentGroup.EquipmentGroupID)
EntityType | PK (all 3 columns)
EntityID | (not FK, but references either Equipment.EquipmentID
or EquipmentGroup.EquipmentGroupID)
Now that I've put forth an arguably ugly schema, allow me to explain a bit...
Every equipment group can either be an and
group or an or
group (as designated by the LogicalOperator
column). The members of each group are defined in the EquipmentGroupEquipment
table, with EntityID
referencing either Equipment.EquipmentID
or another EquipmentGroup.EquipmentGroupID
, the target being determined by the value in EntityType
. This will allow you to compose a group that consists of equipment or other groups.
This will allow you to represent something as simple as "requires equipment A", which would look like this:
EquipmentGroupID LogicalOperator
--------------------------------------------
1 'AND'
EquipmentGroupID EntityType EntityID
--------------------------------------------
1 1 'A'
...all the way to your "A | (D & E & F)", which would look like this:
EquipmentGroupID LogicalOperator
--------------------------------------------
1 'OR'
2 'AND'
EquipmentGroupID EntityType EntityID
--------------------------------------------
1 1 'A'
1 2 2 -- group ID 2
2 1 'D'
2 1 'E'
2 1 'F'
(I realize that I've mixed data types in the EntityID
column; this is just to make it clearer. Obviously you wouldn't do this in an actual implementation)
This would also allow you to represent structures of arbitrary complexity. While I realize that you (correctly) don't wish to overarchitect the solution, I don't think you can really get away with less without breaking 1NF (by combining multiple equipment into a single column).
From what I understood you want to store the equipments in relation to the operations in a way that will allow you to apply your business logic to it later, in that case you'll need 3 tables:
Operations:
- ID
- name
Equipment:
- ID
- name
Operations_Equipment:
- equipment_id
- operation_id
- symbol
Where symbol is A, B, C, etc...
If you have the condition like (A & B) | (C & D)
you can know which equipment is which easily.
精彩评论