Can items with most common descendants be found using SQL / MySQL?
Can SQL be used to find all the brands that has the most common categories?
For example, the brand "Dove" can have category of Soap, Skin Care, Shampoo It is to find all the brands that has the most matching categories, in other words, the most similar brands.
It can be done pr开发者_开发百科ogrammatically using Ruby or PHP: just take a brand, and loop through all the other brands, and see how many matching categories there are, and sort by it. But if there are 2000 brands, then there needs to be 2000 queries per brand. (unless we pre-cache all the 2000 query results, so for all 2000 brands, we re-use those results)
Can it be done by SQL / MySQL by 1 query?
Say, the table has:
entities
--------
id
type = brand or category or product
name
entities_parent_child
--------------------
parent_id
child_id
the table above has an entry for each parent = brand and child = product, and also an entry for each parent = category and child = product, so brand has to relate to category by products.
I think the hard part for SQL is: find all the maximum matching counts, and sort by those numbers.
I agree with wuputah's comment. For this problem an "entities" table is not the answer. You've given yourself a hint the design is wrong when you say you cannot form a query to get the answers you want.
Create a proper hierarchy for your data, with separate tables for separate real word entities, yours will be:
[Brands]
[Categories]
[Products]
If you need help with defining trees and hierarchies in SQL I suggest you pick up a copy of Celko's Trees and Hierarchies in SQL for Smarties.
SQL has no concept of polymorphism so don't try to design your database to fit your programming language. Databases work with sets, so think in sets.
To find similar brands join your tables and use grouping:
SELECT Brands.brand_name, COUNT(Categores.category_name) as category_count
FROM Brands INNER JOIN Categories
ON Brands.brand_name = Categories.brand_name
GROUP BY Brands.brand_name
ORDER BY Brands.brand_name, COUNT(Categores.category_name) -- add DESC if you want largest count at the top
That gives you the basic idea, if you can expand on the requirement:
...find all the maximum matching counts, and sort by those numbers
Then I can help redesign the query and, if necessary, the schema design.
精彩评论