Searching for similar groupings; including diff and score (ie. Similar Recipes)
I'm trying to find the best way to determine how similar a group of items (in this example; ingredients in a guacamole recipe) is t开发者_JS百科o all groups of items (recipes in a table; linked to another table of ingredients).
For instance; I have the following guacamole recipe:
3 Avocados
1 Vine-Ripened Tomatoes 1 Red Onion 3 Jalapenos 1 Sea Salt 1 PepperI want to run this recipe through the table of all my recipes to determine if there is another recipe that is similar to it (based on ingredients and count), order by how similar it is. Additionally, I would like it to identify the differences (whether it's just the difference in count of ingredient; or a different in ingredient).
A possible output would be:
3 Avocados
(- 1 Vine-Ripened Tomatoes) 1 Red Onion 3 Jalapenos 1 Sea Salt (- 1 Pepper) (+ Tobasco) 89.5% IdenticalThis could also be used to determine the following use case: "Given a list of ingredients in my refrigerator; what can I make to eat?".
Thanks for any assistance in pointing me in the right direction.
Off the top of my head, here some issues I can see that will come up through string matching:
3 Avocados
and2 Avocados
both use avocado, but the strings are not a match.1 tbsp salt
and15ml salt
refer to the same quantity of salt but the strings are not a match.
You might want to keep a table of recipe ingredients that also stores normalized quantities (ie. everything would be converted to a specific unit before being put into the db). I'm making the assumption here that you'll already have a table for recipes
and a table for ingredients
, both of which are used as foreign keys here (making this a join table)
CREATE TABLE recipe_ingredients (
recipe_id INT NOT NULL,
ingredient_id INT NOT NULL,
quantity DECIMAL NOT NULL,
PRIMARY KEY (recipe_id, ingredient_id),
FOREIGN KEY recipe_id REFERENCES recipes (id),
FOREIGN KEY ingredient_id REFERENCES ingredient (id)
)
And then when determining matches, you can use determine which recipe contains the most ingredients that you're looking for (this ignores quantities):
SELECT ri.recipe_id, COUNT(ri.ingredient_id) AS num_common_ingredients
FROM ingredients AS i
RIGHT JOIN recipe_ingredients AS ri
ON ri.ingredient_id = i.id
WHERE i.id IN (?) -- list of ingredient IDs being searched for
GROUP BY ri.recipe_id
ORDER BY COUNT(ri.ingredient_id) DESC
The rows with the highest COUNT
have the most similarity (because it means there are the greatest amount of common ingredients).
To determine similarity between quantities, once you have your recipes which match most number of ingredients, you can compare the quantity given to the quantity specified in recipe_ingredients
.
精彩评论