开发者

Design question: Filterable attributes, SQL

I have two tables in my database, Operation and Equipment. An operation requires zero or more attributes. However, there's some logic in how the attributes are attributed:

  • Operation Foo requires equipment A and B
  • Operation Bar requires no equipment
  • Operation Baz requires equipment B and either C or D
  • Operation Quux requires equipment (A or B) and (C or D)

What's the best way to represent this in SQL?

I'm sure people have d开发者_运维百科one this before, but I have no idea where to start.

(FWIW, my application is built with Python and Django.)

Update 1: There will be around a thousand Operation rows and about thirty Equipment rows. The information is coming in CSV form similar to the description above: Quux, (A & B) | (C & D)

Update 2: The level of conjunctions & disjunctions shouldn't be too deep. The Quux example is probably the most complicated, though there appears to be a A | (D & E & F) case.


Think about how you'd model the operations in OO design: the operations would be subclasss of a common superclass Operation. Each subclass would have mandatory object members for the respective equipment required by that operation.

The way to model this with SQL is Class Table Inheritance. Create a common super-table:

CREATE TABLE Operation (
  operation_id   SERIAL PRIMARY KEY,
  operation_type CHAR(1) NOT NULL,
  UNIQUE KEY (operation_id, operation_type),
  FOREIGN KEY (operation_type) REFERENCES OperationTypes(operation_type)
);

Then for each operation type, define a sub-table with a column for each required equipment type. For example, OperationFoo has a column for each of equipA and equipB. Since they are both required, the columns are NOT NULL. Constrain them to the correct types by creating a Class Table Inheritance super-table for equipment too.

CREATE TABLE OperationFoo (
  operation_id   INT PRIMARY KEY,
  operation_type CHAR(1) NOT NULL CHECK (operation_type = 'F'),
  equipA         INT NOT NULL,
  equipB         INT NOT NULL,
  FOREIGN KEY (operation_id, operation_type) 
      REFERENCES Operations(operation_d, operation_type),
  FOREIGN KEY (equipA) REFERENCES EquipmentA(equip_id),
  FOREIGN KEY (equipB) REFERENCES EquipmentB(equip_id)
);

Table OperationBar requires no equipment, so it has no equip columns:

CREATE TABLE OperationBar (
  operation_id   INT PRIMARY KEY,
  operation_type CHAR(1) NOT NULL CHECK (operation_type = 'B'),
  FOREIGN KEY (operation_id, operation_type) 
      REFERENCES Operations(operation_d, operation_type)
);

Table OperationBaz has one required equipment equipA, and then at least one of equipB and equipC must be NOT NULL. Use a CHECK constraint for this:

CREATE TABLE OperationBaz (
  operation_id   INT PRIMARY KEY,
  operation_type CHAR(1) NOT NULL CHECK (operation_type = 'Z'),
  equipA         INT NOT NULL,
  equipB         INT,
  equipC         INT,
  FOREIGN KEY (operation_id, operation_type) 
      REFERENCES Operations(operation_d, operation_type)
  FOREIGN KEY (equipA) REFERENCES EquipmentA(equip_id),
  FOREIGN KEY (equipB) REFERENCES EquipmentB(equip_id),
  FOREIGN KEY (equipC) REFERENCES EquipmentC(equip_id),
  CHECK (COALESCE(equipB, equipC) IS NOT NULL)
);

Likewise in table OperationQuux you can use a CHECK constraint to make sure at least one equipment resource of each pair is non-null:

CREATE TABLE OperationQuux (
  operation_id   INT PRIMARY KEY,
  operation_type CHAR(1) NOT NULL CHECK (operation_type = 'Q'),
  equipA         INT,
  equipB         INT,
  equipC         INT,
  equipD         INT,
  FOREIGN KEY (operation_id, operation_type) 
      REFERENCES Operations(operation_d, operation_type),
  FOREIGN KEY (equipA) REFERENCES EquipmentA(equip_id),
  FOREIGN KEY (equipB) REFERENCES EquipmentB(equip_id),
  FOREIGN KEY (equipC) REFERENCES EquipmentC(equip_id),
  FOREIGN KEY (equipD) REFERENCES EquipmentD(equip_id),
  CHECK (COALESCE(equipA, equipB) IS NOT NULL AND COALESCE(equipC, equipD) IS NOT NULL)
);

This may seem like a lot of work. But you asked how to do it in SQL. The best way to do it in SQL is to use declarative constraints to model your business rules. Obviously, this requires that you create a new sub-table every time you create a new operation type. This is best when the operations and business rules never (or hardly ever) change. But this may not fit your project requirements. Most people say, "but I need a solution that doesn't require schema alterations."

Most developers probably don't do Class Table Inheritance. More commonly, they just use a one-to-many table structure like other people have mentioned, and implement the business rules solely in application code. That is, your application contains the code to insert only the equipment appropriate for each operation type.

The problem with relying on the app logic is that it can contain bugs and might insert data the doesn't satisfy the business rules. The advantage of Class Table Inheritance is that with well-designed constraints, the RDBMS enforces data integrity consistently. You have assurance that the database literally can't store incorrect data.

But this can also be limiting, for instance if your business rules change and you need to adjust the data. The common solution in this case is to write a script to dump all the data out, change your schema, and then reload the data in the form that is now allowed (Extract, Transform, and Load = ETL).

So you have to decide: do you want to code this in the app layer, or the database schema layer? There are legitimate reasons to use either strategy, but it's going to be complex either way.


Re your comment: You seem to be talking about storing expressions as strings in data fields. I recommend against doing that. The database is for storing data, not code. You can do some limited logic in constraints or triggers, but code belongs in your application.

If you have too many operations to model in separate tables, then model it in application code. Storing expressions in data columns and expecting SQL to use them for evaluating queries would be like designing an application around heavy use of eval().


I think you should have either a one-to-many or many-to-many relationship between Operation and Equipment, depending on whether there is one Equipment entry per piece of equipment, or per equipment type.

I would advise against putting business logic into your database schema, as business logic is subject to change and you'd rather not have to change your schema in response.


Looks like you'll need to be able to group certain equipment together as either conjunction or disjunction and combine these groups together...

OperationEquipmentGroup
   id int
   operation_id int 
   is_conjuction bit 

OperationEquipment
   id int
   operation_equipment_group_id int
   equipment_id

You can add ordering columns if that is important and maybe another column to the group table to specify how groups are combined (only makes sense if ordered). But, by your examples, it looks like groups are only conjuncted together.


Since Operations can have one or more piece of equipment, you should use a linking table. Your schema would be like this:

Operation

  • ID
  • othercolumn

Equipment

  • ID
  • othercolumn

Operation_Equipment_Link

  • OperationID
  • EquipmentID

The two fields in the third table can be set up as a composite primary key, so you don't need a third field and can more easily keep duplicates out of the table.


In addition to Nicholai's suggestion I solved a similar problem as following:

Table Operation has an additional field "OperationType"

Table Equipment has an additional field "EquipmentType"

I have an additional table "DefaultOperationEquipmentType" specifying which EquipmentType needs to be include with each OperationType, e.g.

OperationType  EquipmentType
==============.=============.
Foo_Type       A_Type
Foo_Type       B_Type
Baz_Type       B_Type
Baz_Type       C_Type

My application doesn't need complex conditions like (A or B) because in my business logic both alternative equipments belong to the same type of equipment, e.g. in a PC environment I could have an equipment Mouse (A) or Trackball (B), but they both belong to EquipmentType "PointingDevice_Type"

Hope that helps


Be Aware I have not tested this in the wild. That being said, the best* way I can see to do a mapping is with a denormalized table for the grouping.

*(aside from Bill's way, which is hard to set up, but masterful when done correctly)

Operations:
--------------------
Op_ID int not null pk
Op_Name varchar 500

Equipment: 
--------------------
Eq_ID int not null pk
Eq_Name varchar 500
Total_Available int

Group:
--------------------
Group_ID int not null pk
-- Here you have a choice. You can either:
-- Not recommended   
Equip varchar(500) --Stores a list of EQ_ID's {1, 3, 15}
-- Recommended
Eq_ID_1 bit
Eq_1_Total_Required
Eq_ID_2 bit
Eq_2_Total_Required
Eq_ID_3 bit
Eq_3_Total_Required
-- ... etc.

Operations_to_Group_Mapping:
--------------------
Group_ID int not null frk
Op_ID int not null frk

Thus, in case X: A | (D & E & F)

Operations:
--------------------
Op_ID    Op_Name
1        X

Equipment: 
--------------------
Eq_ID    Eq_Name    Total_Available
1        A          5
-- ... snip ...
22       D          15
23       E          0
24       F          2

Group:
--------------------
Group_ID    Eq_ID_1    Eq_1_Total_Required -- ... etc. ...
1           TRUE       3
-- ... snip ...
2           FALSE      0

Operations_to_Group_Mapping:
--------------------
Group_ID    Op_ID
1           1
2           1 


As loathe as I am to put recursive (tree) structures in SQL, it sounds like this is really what you're looking for. I would use something modeled like this:

Operation
----------------
OperationID            PK
RootEquipmentGroupID   FK -> EquipmentGroup.EquipmentGroupID
...

Equipment
----------------
EquipmentID            PK
...

EquipmentGroup
----------------
EquipmentGroupID       PK
LogicalOperator

EquipmentGroupEquipment
----------------
EquipmentGroupID |     (also FK -> EquipmentGroup.EquipmentGroupID)
EntityType       |     PK (all 3 columns)
EntityID         |     (not FK, but references either Equipment.EquipmentID
                        or EquipmentGroup.EquipmentGroupID)

Now that I've put forth an arguably ugly schema, allow me to explain a bit...

Every equipment group can either be an and group or an or group (as designated by the LogicalOperator column). The members of each group are defined in the EquipmentGroupEquipment table, with EntityID referencing either Equipment.EquipmentID or another EquipmentGroup.EquipmentGroupID, the target being determined by the value in EntityType. This will allow you to compose a group that consists of equipment or other groups.

This will allow you to represent something as simple as "requires equipment A", which would look like this:

EquipmentGroupID   LogicalOperator
--------------------------------------------
1                  'AND'

EquipmentGroupID   EntityType   EntityID
--------------------------------------------
1                  1            'A'

...all the way to your "A | (D & E & F)", which would look like this:

EquipmentGroupID   LogicalOperator
--------------------------------------------
1                  'OR'
2                  'AND'

EquipmentGroupID   EntityType   EntityID
--------------------------------------------
1                  1            'A'
1                  2            2 -- group ID 2
2                  1            'D'
2                  1            'E'
2                  1            'F'

(I realize that I've mixed data types in the EntityID column; this is just to make it clearer. Obviously you wouldn't do this in an actual implementation)

This would also allow you to represent structures of arbitrary complexity. While I realize that you (correctly) don't wish to overarchitect the solution, I don't think you can really get away with less without breaking 1NF (by combining multiple equipment into a single column).


From what I understood you want to store the equipments in relation to the operations in a way that will allow you to apply your business logic to it later, in that case you'll need 3 tables:

Operations:

  • ID
  • name

Equipment:

  • ID
  • name

Operations_Equipment:

  • equipment_id
  • operation_id
  • symbol

Where symbol is A, B, C, etc...

If you have the condition like (A & B) | (C & D) you can know which equipment is which easily.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜