Taking two tables, 1-to-many, how can I filter down the many table and then join ALL the matches in the 1 table?
I apologize in advance for the length, the solution may well be trivial, just wanted to be as informative as I could.
The Tables
I have two tables of note: items and products, which is a 1 to many relationship. One item can have multiple product which are variations in color and material. Brand is an external category table that doesn't have to much part to play in this select statement.
So an item is, for example, a specific shoe, e.g. a "park avenue" shoe. A product is, for example, merlot burnished calfskin. And the brand would just be Allen Edmonds. Overall you get an Allen Edmonds park avenue shoe in merlot burnished calfskin.
Missing results in a "show almost everything" search
Someone decided to create a manual flag to associate the default color and material with a shoe, so that when you search, each type of shoe only shows up once, and when you click on it you can find it's other colors and materials. That's fine, but some shoes have no default material and color set. As an unfortunate result, those without at least one default set don't show up in the search.
Current Select Statement
Here is the current select, which filters out everything that doesn't have a default manually set:
SELECT DISTINCT items.ItemId
, items.Name
, items.BrandCategoryId
, items.CatalogPage
, items.GenderId
, items.PriceRetail
, items.PriceSell
, items.PriceHold
, items.Descr
, items.FlagStatus as ItemFlagStatus
, products.ImagetnURL
, products.FlagDefault
, products.ProductId
, products.Code as ProductCode
, products.Name as ProductName
, brands.Name as BrandName
FROM items
, products
, brands
WHERE items.ItemId = products.ItemId
AND items.BrandCode = brands.Code
AND items.FlagStatus != 'U'
AND products.FlagStatus != 'U'
AND products.FlagDefault = 'Y';
Not my choice of code, I suspect that the "DISTINCT" part of that statement is a bad idea, but I'm not exactly clear how to get rid of it.
The big problem I'm having right now, though is that final line
AND products.FlagDefault = 'Y'
that filters out everything that doesn't have at least one manual default set.
Edit: Here's an explain for the query:
+----+-------------+----------+--------+-----------------------------------------------------------+---------+---------+-------------------------+-------+--------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+--------+-----------------------------------------------------------+---------+---------+-------------------------+-------+--------------------------------+
| 1 | SIMPLE | brands | ALL | NULL | NULL | NULL | NULL | 38 | Using temporary |
| 1 | SIMPLE | products | ALL | FlagStatus,FlagStatus_2,FlagStatus_3,flagstatusanddefault | NULL | NULL | NULL | 16329 | Using where; Using join buffer |
| 1 | SIMPLE | items | eq_ref | PRIMARY,BrandCode,FlagStatus,FlagStatus_2,FlagStatus_3 | PRIMARY | 4 | sherman.products.ItemId | 1 | Using where |
+----+-------------+----------+--------+-----------------------------------------------------------+---------+---------+-------------------------+-------+--------------------------------+
3 rows in set (0.01 sec)
And here is a describe on products, items, a开发者_JAVA百科nd brands:
mysql> describe products;
+-------------+--------------+------+-----+-------------------+-----------------------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+--------------+------+-----+-------------------+-----------------------------+
| ProductId | int(11) | NO | PRI | NULL | auto_increment |
| ItemId | int(11) | YES | | NULL | |
| Code | varchar(15) | YES | MUL | NULL | |
| Name | varchar(100) | YES | | NULL | |
| MaterialId | int(11) | YES | MUL | NULL | |
| PriceRetail | decimal(6,2) | YES | | NULL | |
| PriceSell | decimal(6,2) | YES | | NULL | |
| PriceHold | decimal(6,2) | YES | | NULL | |
| Cost | decimal(6,2) | YES | | NULL | |
| FlagDefault | char(1) | NO | | N | |
| FlagStatus | char(1) | YES | MUL | NULL | |
| ImagetnURL | varchar(50) | YES | | NULL | |
| ImagefsURL | varchar(50) | YES | | NULL | |
| ImagelsURL | varchar(50) | YES | | NULL | |
| DateStatus | timestamp | NO | | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
| DateCreated | timestamp | YES | | NULL | |
+-------------+--------------+------+-----+-------------------+-----------------------------+
16 rows in set (0.02 sec)
mysql> describe items
-> ;
+-----------------+--------------+------+-----+-------------------+-----------------------------+
| Field | Type | Null | Key | Default | Extra |
+-----------------+--------------+------+-----+-------------------+-----------------------------+
| ItemId | int(11) | NO | PRI | NULL | auto_increment |
| Code | varchar(25) | YES | | NULL | |
| Name | varchar(100) | YES | MUL | NULL | |
| BrandCode | char(2) | YES | MUL | NULL | |
| CatalogPage | int(3) | YES | | NULL | |
| BrandCategoryId | int(11) | YES | | NULL | |
| TypeId | int(11) | YES | MUL | NULL | |
| StyleId | int(11) | YES | MUL | NULL | |
| GenderId | int(11) | YES | MUL | NULL | |
| PriceRetail | decimal(6,2) | YES | | NULL | |
| PriceSell | decimal(6,2) | YES | | NULL | |
| PriceHold | decimal(6,2) | YES | | NULL | |
| Cost | decimal(6,2) | YES | | NULL | |
| PriceNote | longtext | YES | | NULL | |
| FlagTaxable | char(1) | YES | | NULL | |
| FlagStatus | char(1) | YES | MUL | NULL | |
| FlagFeatured | char(1) | YES | | NULL | |
| MaintFlagStatus | char(1) | YES | | NULL | |
| Descr | longtext | YES | | NULL | |
| DescrNote | longtext | YES | | NULL | |
| ImagetnURL | varchar(50) | YES | | NULL | |
| ImagefsURL | varchar(50) | YES | | NULL | |
| ImagelsURL | varchar(50) | YES | | NULL | |
| DateCreated | date | NO | | 0000-00-00 | |
| DateStatus | timestamp | NO | | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
+-----------------+--------------+------+-----+-------------------+-----------------------------+
25 rows in set (0.00 sec)
mysql> describe brands;
+--------------+------------------+------+-----+-------------------+-----------------------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+------------------+------+-----+-------------------+-----------------------------+
| BrandId | int(11) unsigned | NO | PRI | NULL | auto_increment |
| Code | varchar(6) | YES | | NULL | |
| PriceCode | varchar(4) | YES | | NULL | |
| Name | varchar(50) | YES | | NULL | |
| WebsiteURL | varchar(50) | YES | | NULL | |
| LogoURL | varchar(50) | YES | | NULL | |
| LogoTopURL | varchar(50) | YES | | NULL | |
| BrandURL | varchar(50) | YES | | NULL | |
| Descr | longtext | YES | | NULL | |
| DescrShort | longtext | YES | | NULL | |
| BeltDescr | longtext | YES | | NULL | |
| ImageURL | varchar(50) | YES | | NULL | |
| SaleImageURL | varchar(50) | YES | | NULL | |
| SaleCode | varchar(6) | YES | | NULL | |
| SaleDateBeg | date | YES | | NULL | |
| SaleDateEnd | date | YES | | NULL | |
| FlagStatus | char(1) | YES | | NULL | |
| DateStatus | timestamp | NO | | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
| DateCreated | timestamp | YES | | NULL | |
+--------------+------------------+------+-----+-------------------+-----------------------------+
19 rows in set (0.00 sec)
Possibilities that I am exploring
Subselect that grinds everything to a halt
I have a select statement that might, in a perfect, zero-execution-time world, work, by selecting the products the first product for each item, ordered by that flagdefault
field, e.g.:
AND products.productid =
(select productid
from products
where products.itemid = items.itemid
AND products.FlagStatus != 'U'
order by FlagDefault='Y'
, itemid
limit 1);
replacing the check for a manually toggled default with an id that's only ordered by default, even if it's not toggled, and only takes the first result.
That statement grinds to a halt, and actually causes other use on the site to put mysql statements into deadlock (I suppose because reading of those tables is making them unavailable elsewhere).
Join that makes sure one table is distinct and not the next?
One way to get around it that might work is doing a:
select distinct ItemId from products ORDER BY default
And then just going further to obtain data for those itemids specifically, but I'm not sure how to make that happen in a single statement, not sure how to join select distincts well, and I expect that even making that select "distinct" in the first place isn't ideal, since it's selecting more than is needed to begin with and then cutting them down afterwards, but I don't have a better alternative for determining distinctness, really.
Advice?
In general, the select statement could use a lot of improvement, and specifically I could really use some advice on how to filter down the results for the most specific table and only -then- join upstream to the table that is the "one" in the one to many relationship.
Remove from WHERE
:
AND products.FlagStatus != 'U'
AND products.FlagDefault = 'Y'
Add to FROM
:
(
(SELECT ProductId
FROM products
WHERE FlagStatus != 'U' AND FlagDefault = 'Y')
UNION
(SELECT MIN(ProductId)
FROM products
WHERE FlagStatus != 'U'
GROUP BY ItemId
HAVING MAX(FlagDefault) != 'Y')
) AS defaults
Add to WHERE
:
AND defaults.ProductId = products.ProductId
I'm using the term "non-hidden" for rows which have FlagStatus != 'U'
, since I'm assuming that's what the flag is for.
The first SELECT
gives the ProductId
of all default products, and the second one gives a ProductId
for all the items without a default product. Hidden items are filtered by both, so if a default product has been hidden, a non-default product is displayed instead. When concatenated, you get a ProductId
for every item that has some non-hidden product.
I'm assuming FlagDefault
can only have values 'Y'
or 'N'
. The second query filters out the items having a default product by using MAX(FlagDefault)
, which works because 'Y' > 'N'
.
By joining this to the products
table of the original query, instead of filtering with FlagDefault
, you should get the same results as the original, except you also get one row for every item which does not have a default product.
I've tested this query, but I haven't tested it with your original one since I don't have any meaningful data (read: your data) to test it against. This one works, so the combination should also work. For the same reason, I don't have any real numbers about performance - and I'm not an expert on query performance, either (more like a newbie). However, from what I've heard, subqueries in the WHERE
clause are supposed to be bad for the performance, but in the FROM
clause they should be okay. So, test it, I hope it's fast enough and fits the job.
As others mentioned, if you haven't got an index for the products.ItemId
and BrandCode
columns, you should definitely add them. You should also consider if requiring every item to have one hand-picked default would be okay, or maybe ditching the hand-picked defaults and always using random ones. Another thing to consider is if you really need the data from a product when there is no default - could you live without the image url, product name (use the item name?) and product code for those products?
Edit: One more possibility: You could change products.FlagDefault
to items.DefaultProductId
. That way it'd be easier to find out if an item has a default product and it enforces only one default product per item.
SELECT
items.ItemId,
items.Name,
items.BrandCategoryId,
items.CatalogPage,
items.GenderId,
items.PriceRetail,
items.PriceSell,
items.PriceHold,
items.Descr,
items.FlagStatus as ItemFlagStatus,
T3.ImagetnURL,
T3.FlagDefault,
T3.ProductId,
T3.Code as ProductCode,
T3.Name as ProductName,
brands.Name as BrandName
FROM items INNER JOIN
(
SELECT DISTINCT
T1.ItemId,
T1.ImagetnURL,
T1.FlagDefault
T1.ProductId,
T1.Code
T1.Name,
T1.FlagStatus
FROM
products AS T1 LEFT JOIN
products AS T2 ON T1.products.ProductId = T2.products.ProductId
AND T2.FlagDefault = 'Y'
) AS T3 ON items.ItemId = T3.ItemId INNER JOIN
brands ON items.BrandCode = brands.Code
WHERE items.FlagStatus != 'U'
AND T3.FlagStatus != 'U'
I'm not sure I understand fully the FlagStatus and the FlagDefault. For an item with no default, do all its products have products.FlagDefault != 'Y'
?
If yes, can you try this? It will (hopefully return all items with NULLs in the products fields for items with no default):
SELECT items.ItemId
, items.Name
, items.BrandCategoryId
, items.CatalogPage
, items.GenderId
, items.PriceRetail
, items.PriceSell
, items.PriceHold
, items.Descr
, items.FlagStatus as ItemFlagStatus
, products.ImagetnURL
, products.FlagDefault
, products.ProductId
, products.Code as ProductCode
, products.Name as ProductName
, brands.Name as BrandName
FROM items
LEFT JOIN
products
ON items.ItemId = products.ItemId
AND products.FlagDefault = 'Y'
JOIN
brands
ON items.BrandCode = brands.Code
;
The LEFT JOIN
:
LEFT JOIN
products
ON items.ItemId = products.ItemId
AND products.FlagDefault = 'Y'
is equivalent to:
LEFT JOIN
( SELECT *
FROM products
WHERE products.FlagDefault = 'Y'
) AS p
ON items.ItemId = p.ItemId
So, it does, as you ask, "filters down the results for the most specific table and only -then- joins upstream to ..."
When using LEFT JOIN
s, the result can be different if you place the filtering conditions you have, at the ON
clause, or later after all JOINS at the WHERE clause.
I am not sure about performance as you did not post table structure and sizes or explain plan, but how about a UNION
between your first query (items with default products) and a query which fetches one product per item, only for items with no default product?
It's a bit long, but give it a shot - let me know if it gets you the correct data and how long it takes...
(SELECT items.ItemId
, items.Name
, items.BrandCategoryId
, items.CatalogPage
, items.GenderId
, items.PriceRetail
, items.PriceSell
, items.PriceHold
, items.Descr
, items.FlagStatus as ItemFlagStatus
, products.ImagetnURL
, products.FlagDefault
, products.ProductId
, products.Code as ProductCode
, products.Name as ProductName
, brands.Name as BrandName
FROM items
JOIN products ON items.ItemId = products.ItemId
JOIN brands ON items.BrandCode = brands.Code
WHERE items.FlagStatus != 'U'
AND products.FlagStatus != 'U'
AND products.FlagDefault = 'Y'
GROUP BY items.ItemId)
UNION
(SELECT items.ItemId
, items.Name
, items.BrandCategoryId
, items.CatalogPage
, items.GenderId
, items.PriceRetail
, items.PriceSell
, items.PriceHold
, items.Descr
, items.FlagStatus as ItemFlagStatus
, products.ImagetnURL
, products.FlagDefault
, products.ProductId
, products.Code as ProductCode
, products.Name as ProductName
, brands.Name as BrandName
FROM items
JOIN products ON items.ItemId = products.ItemId
JOIN brands ON items.BrandCode = brands.Code
WHERE items.FlagStatus != 'U'
AND products.FlagStatus != 'U'
AND products.FlagDefault != 'Y'
AND items.ItemId NOT IN
(SELECT DISTINCT itemId
FROM products
WHERE products.FlagDefault = 'Y')
GROUP BY items.ItemId)
精彩评论