how to store data with many categories and many properties efficiently?
We have a lar开发者_如何学Cge number of data in many categories with many properties, e.g.
category 1: Book
properties: BookID, BookName, BookType, BookAuthor, BookPrice
category 2: Fruit
properties: FruitID, FruitName, FruitShape, FruitColor, FruitPrice
We have many categories like book and fruit. Obviously we can create many tables for them (MySQL e.g.), and each category a table. But this will have to create too many tables and we have to write many "adapters" to unify manipulating data.
The difficulties are:
1) Every category has different properties and this results in a different data structure.
2) The properties of every categoriy may have to be changed at anytime.
3) Hard to manipulate data if each category a table (too many tables)
How do you store such kind of data?
You can separate the database into two parts: Definition Tables and Data Tables. Basically the Definition Tables is used to interpret the Data Tables where the actual data is stored (some would say that the definition tables is more elegant if represented in XML).
The following is the basic idea.
Definition Tables:
TABLE class
class_id (int)
class_name (varchar)
TABLE class_property
property_id (int)
class_id (int)
property_name (varchar)
property_type (varchar)
Data Tables:
TABLE object
object_id (int)
class_id (varchar)
TABLE object_property
property_id (int)
property_value (varchar)
It would be best if you could also create additional Layer to interpret the structure so as to make it easier for the Data Layer to operate on the data. And you must of course take into consideration performance, ease of query, etc.
Just my two cents, I hope it could be of any help.
Regards.
If your data collection isn't too big, the Entity-Attribute-Value (EAV) model may fit nicely the bill.
In a nutshell, this structure allows the definition of Categories, the list of [required or optional] Attributes (aka properties) the entities in such category include etc, in a set of tables known as the meta-data, the logical schema of the data, if you will. The entity instances are stored in two tables a header and a values tables, whereby each attribute is stored in a single [SQL] record of the later table (aka "vertical" storage: what used to be a record in traditional DBMS model is made of several records of the value table).
This format is very practical in particular for its flexibility: it allows both late and on-going changes in the logical schema (addition of new categories, additions/changes in the attributes of a given category etc.), as well the implicit data-driven handling of the underlying catalog's logical schema, at the level of the application. The main drawbacks of this format are the [somewhat] more sophisticated, abstract, implementation and, mainly, some limitations with regards to scaling etc. when the catalog size grows, say in the million+ entities range.
See the EAV model described in more details in this SO answer of mine.
Triggered by this question and other similar ones, I wrote a blog post on how to handle such cases using a graph database. In short, graph databases don't have the problem "how to force a tree/hierarchy into tables" as there's simply no need for it: you store your tree structure as it is. They're not good at everything (like for example creating reports) but this is a case where graph databases shine.
精彩评论