Database design: EAV options?
This is just a database concept question: what are the pros and cons of the following model for EAV?
Model 1:
TABLE: attribute_value
======================================
| id | fk_id | attribute | value |
======================================
| 1 | 10 | FName | John |
| 2 | 10 | Lname | Doe |
| 3 | 55 | FName | Bob |
| 4 | 55 | Lname | Smith |
--------------------------------------
Model 2:
TABLE: attribute
==================
| id | attribute |
==================
| 1 | FName |
| 2 | Lname |
------------------
TABLE: value
=====================================
| id | attribute_id | fk_id | value |
===================开发者_如何学JAVA==================
| 1 | 1 | 10 | John |
| 2 | 2 | 10 | Doe |
| 3 | 1 | 55 | Bob |
| 4 | 2 | 55 | Smith |
-------------------------------------
One benefit I see with Model 2 is that the attribute
does not contain duplicates.
Although minimalist as shown, the attribute table of Model2 introduces the concept of meta-data into the mix, with all the good that comes from it. There are other advantages to Model2, for example the performance gains associated with smaller row size (of the Value table), but I'd like to focus on the meta-data concept.
Even as-is Model2's attribute table constitute a repository of all valid attributes (with model1 one would need to run an aggregate query of sorts to get such a list). Also, and as-is, the repository is sufficient to introduce foreign key constraints to help maintaining the integrity of the dataset (with Model 1 one would need external forms of validation of the values stored in attribute column.
With a few simple additions, the attribute table can become a versatile repository which can be used for various purposes. For example the table may include some of the following
- info such as the display-friendly name of each attribute
- some flags indicating the type of field (numeric vs. string vs. date etc.), for differentiated handling / processing
- the particular Value table where the underlying attribute is stored (Model only shows one table but optimization/scaling sometimes prompts splitting the tables)
- the fact that the attribute may be stored as its own column in the "Value" table (again a form of optimization, essentially getting the best of both worlds: the flexibility of the schema of the EAV model but the performance of traditional relational model for the attributes that are the most used and/or the most common to all entities.
- the ability to rename attributes, without disturbing the main table. Changes at meta-data level only.
- various application-oriented semantics. For example indicators that a particular attribute should be offered as one of the basic vs. advanced search fields.
In a nutshell, the attribute table becomes a resource which allows the application to be truly data-driven (or more precisely, meta data driven). Indeed you may also like an entity table i.e. one where the metadata pertaining to the various entities types are gathered: which are the different entity types, which attributes are allowed for which entity type etc.
Now... do pay heed to the comment from zerkms, below the question itself. For all its benefits, the EAV model also comes with its share of drawbacks and challenges, as hinted the complexity of the queries come to mind, and also performance issues. These concerns should however not disqualify, a priori, EAV: there are many use cases where EAV is a better approach.
Assuming EAV is the choice then Model2, or even something slighly more sophisticated is definitively superior to model1.
At the conceptual level, these two models are virtually identical. You've just replaced strings with ID numbers. That's all.
As far as foreign keys go, you could impose a foreign key constraint on "attribute" in Model 1 if you wanted to.
As far as pros and cons go, there's really no difference between these two implementations of EAV. All Bill Karwin's points apply to both.
For Model 2, you can impose a Foreign-Key on attribute_id and make sure that the only defined attributes can enter the table.
Also for Model 2, you can have faster queries to get values with certain attributes ids since if you make a foreign-key (index), querying will be faster.
精彩评论