How to implement filter system in SQL?

2022-12-28 20:49 问答作者：

Right now I am planning to add a filter system to my site.

Examples:

(ID=apple, COLOR=red, TASTE=sweet, ORIGIN=US)
(ID=mango, COLOR=yellow, TASTE=sweet, ORIGIN=MEXICO)
(ID=banana, COLOR=yellow, TASTE=bitter-sweet, ORIGIN=US)

so now I am interested in doing the following: SELECT ID FROM thisTable WHERE COLOR='yellow' AND TASTE='SWEET'

But my problem is I am doing this for multiple categories in my site, and the columns are NOT consistent. (like if the table is for handphones, then it will be BRAND, 3G-ENABLED, PRICE, COLOR, WAVELENGTH, etc)

how could I design a genera开发者_如何学编程l schema that allows this?

Right now I am planning on doing:

table(ID, KEY, VALUE)

This allows arbitary number of columns, but for the query, I am using SELECT ID FROM table WHERE (KEY=X1 AND VALUE=V1) AND (KEY=X2 AND VALUE=V2), .. which returns an empty set.

Can someone recommend a good solution to this? Note that the number of columns WILL change regularly

The entity-attribute-value model that you suggest could fit in this scenario.

Regarding the filtering query, you have to understand that with the EAV model you will sacrifice plenty of query power, so this can become quite tricky. However this one way to tackle your problem:

SELECT    stuff.id 
FROM      stuff 
JOIN      (SELECT    COUNT(*) matches
           FROM      table
           WHERE     (`key` = X1 AND `value` = V1) OR 
                     (`key` = X2 AND `value` = V2) 
           GROUP BY  id
          ) sub_t ON (sub_t.matches = 2 AND sub_t.id = stuff.id)
GROUP BY  stuff.id;

One inelegant feature of this approach is that you need to specify the number of attribute/value pairs that you expect to match in sub_t.matches = 2. If we had three conditions we would have had to specify sub_t.matches = 3, and so on.

Let's build a test case:

CREATE TABLE stuff (`id` varchar(20), `key` varchar(20), `value` varchar(20));

INSERT INTO stuff VALUES ('apple',  'color',  'red');
INSERT INTO stuff VALUES ('mango',  'color',  'yellow');
INSERT INTO stuff VALUES ('banana', 'color',  'yellow');

INSERT INTO stuff VALUES ('apple',  'taste',  'sweet');
INSERT INTO stuff VALUES ('mango',  'taste',  'sweet');
INSERT INTO stuff VALUES ('banana', 'taste',  'bitter-sweet');

INSERT INTO stuff VALUES ('apple',  'origin',  'US');
INSERT INTO stuff VALUES ('mango',  'origin',  'MEXICO');
INSERT INTO stuff VALUES ('banana', 'origin',  'US');

Query:

SELECT    stuff.id 
FROM      stuff 
JOIN      (SELECT    COUNT(*) matches, id
           FROM      stuff
           WHERE     (`key` = 'color' AND `value` = 'yellow') OR 
                     (`key` = 'taste' AND `value` = 'sweet')
           GROUP BY  id
          ) sub_t ON (sub_t.matches = 2 AND sub_t.id = stuff.id)
GROUP BY  stuff.id;

Result:

+-------+
| id    |
+-------+
| mango |
+-------+
1 row in set (0.02 sec)

Now let's insert another fruit with color=yellow and taste=sweet:

INSERT INTO stuff VALUES ('pear', 'color', 'yellow');
INSERT INTO stuff VALUES ('pear', 'taste', 'sweet');
INSERT INTO stuff VALUES ('pear', 'origin', 'somewhere');

The same query would return:

+-------+
| id    |
+-------+
| mango |
| pear  |
+-------+
2 rows in set (0.00 sec)

If we want to restrict this result to entities with origin=MEXICO, we would have to add another OR condition and check for sub_t.matches = 3 instead of 2.

SELECT    stuff.id 
FROM      stuff 
JOIN      (SELECT    COUNT(*) matches, id
           FROM      stuff
           WHERE     (`key` = 'color' AND `value` = 'yellow') OR 
                     (`key` = 'taste' AND `value` = 'sweet') OR 
                     (`key` = 'origin' AND `value` = 'MEXICO')
           GROUP BY  id
          ) sub_t ON (sub_t.matches = 3 AND sub_t.id = stuff.id)
GROUP BY  stuff.id;

Result:

+-------+
| id    |
+-------+
| mango |
+-------+
1 row in set (0.00 sec)

As in every approach, there are certain advantages and disadvantages when using the EAV model. Make sure you research the topic extensively in the context of your application. You may even want to consider an alternative relational databases, such as Cassandra, CouchDB, MongoDB, Voldemort, HBase, SimpleDB or other key-value stores.

The following worked for me:

SELECT * FROM mytable t WHERE 
    t.key = "key" AND t.value = "value" OR
    t.key = "key" AND t.value = "value" OR
    ....
    t.key = "key" AND t.value = "value"
GROUP BY t.id having count(*)=3;

count(*)=3 must match the amount of

t.key = "key" AND t.value = "value"

cases

What you are suggesting is known as an Entity-Attribute-Value structure and is highly discouraged. One of the (many) big problems with EAV designs for example is in data integrity. How you do enforce that colors only consist of "red", "yellow", "blue" etc? In short, you can't without a lot of hacks. Another problem rears itself in querying (as you have seen) and searching for data.

Instead, I would recommend creating a table that represents each type of entity and thus each table can have attributes (columns) that are specific to that type of entity.

In order to convert the data into columns in a result query as you are seeking, you will need to create what is often called a crosstab query. There are report engines that will do it and you can do it code but most database products will not do it natively (meaning without building the SQL string manually). The performance of course will not be good if you have a lot of data and you will run into problems filtering on the data. For example, suppose that some of the values are supposed to be numeric. Because the value part of the EAV is likely to be a string, you will have to cast those values to an integer before you can filter on them and that presumes that the data will be convertible to an integer.

The price you pay for simplistic table design at this stage will cost you in terms of performance in the long run. Using ORM to reduce the cost of modifying the database to fit data in an appropriate structure would probably be a good time investment, even in spite of ORM's performance cost.

Otherwise, you may want to look for a "reverse ORM" that maps the code from your database, which has the benefit of being less expensive and having higher performance. (Slightly higher starting cost compared to ORM, but better long-term performance and reliability.)

It's a costly problem regardless of how you slice it. Do you want to pay now with development time or pay later when your performance tanks? ("Pay later" is the wrong answer.)

继续阅读：sql

How to implement filter system in SQL?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？