Redundant user generated data in a database - how to structure? E.g., recipes share ingredients
I'm building a website for which a large amount of user generated data will be stored in a back end db and will be discoverable by all users.
By its nature, a portion of the user entries will be redundant. For the sake of example, let's imagine this data is made of recipes. Suppose the following entries:
Title: Lemony dressing: - 1 tbsp lemon juice - 2 tbsp olive oil Just mix!
and
Title: Lemony marinade: - 2 tbsp lemon juice - 2 tbsp olive oil - 1 tbsp soy sauce Just mix the ingredients together!
Clearly, those 2 entries have some redundant information: half the title is shared, the instructions are shared partially, one ingredient is shared exactly, another ingredient is partially shared....
Furthermore, one could easily imagine a recipe made of several sub-recipes, which can also exist on their own. So for example, you could have a marinated flank steak recipe that uses the lemony marina开发者_StackOverflow社区de as a step. So recipes can be defined recursively.
My question is how should I structure my SQL recipe database? My gut feeling is that it makes sense to store the entry "lemon juice" in an Ingredients table, the quantity "2 tbsp" in a Quantity one, and then use their keys in the Recipe table.
But how far should I go? Should I split "lemon juice" into two entries, in order to prevent duplication with "orange juice" and "lemon zest"? After all, ingredients can also be defined recursively.
But then sentences can too, and instructions on how to use the ingredients are going to have some overlap - but over the long term maybe not all that much. Does it make sense then to store each word separately? Why not letters?
I'm not sure how to consider the trade-offs like this. I suppose that if I minimize the redundancy of the info in the tables, there must be some cost. Is it performance wise, or is it a cost paid at conception?
Thank you,
JDelage
Everything is composed of something until you hit the fundamental elements of your domain. In a restaurant environment these would be what a chef would call ingredients, ie. stuff that comes from a food processing plant or a farm. However at the food processing plant these are the output rather than the input. I've tried to draw this below, it's not complicated once you understand that it is a model of how things are.
So model this, using a table ProductBase
Ingredients, recipes, menus, etc. are all the same. They are items that are made up from other items, until you get to items that are generally agreed to be fundamental elements, e.g. lemons, salt, pepper, beef steak, etc.
So if you have a recipe A, that contains ingredients A1, A2 etc. then these ingredients A1,A2 could be a recipe (i.e. a list of parts that are processed to produce something), or something pre-prepared. Let the data model reflect this.
From the data model you don't need to make distinctions between what we call recipes, ingredients, etc, I store all of these in a single products table, with ParentProductID,
The parent product is used to provide containers - a list is defined by its contents.
The Linked product is used to define the container elemenent reference, i.e. a recipe has a list of ingredients, so there is an entry of type INGREDIENT, which has LinkedProductID of the actual product used.
CREATE TABLE `productbase` (
`productid` CHAR(30) NULL,
`productname` VARCHAR(200) NULL,
`parentproductid` CHAR(30) NULL,
`linkedproductid` CHAR(30) NULL,
`categoryid` CHAR(30) NULL,
`supplierid` CHAR(30) NULL,
`type` CHAR(30) NULL,
`subtype` INT(11) NULL DEFAULT NULL,
`cost` DECIMAL(10,4) NULL DEFAULT NULL,
`mcu` CHAR(30) NULL,
`mcuperpack` DECIMAL(10,4) NULL DEFAULT NULL,
`quantity` DECIMAL(10,4) NULL DEFAULT NULL,
)
In addition to what Richard Harrison said: I don't know if you do OOP in addition to DB development, but what you're describing is the database equivalent of the Composite pattern. I suggest looking into implementations of that; both what people have done on the database side, and the OOP side for your ORM/data layer.
精彩评论