开发者

How to implement filter system in SQL?

Right now I am planning to add a filter system to my site.

Examples:

(ID=apple, COLOR=red, TASTE=sweet, ORIGIN=US)
(ID=mango, COLOR=yellow, TASTE=sweet, ORIGIN=MEXICO)
(ID=banana, COLOR=yellow, TASTE=bitter-sweet, ORIGIN=US)

so now I am interested in doing the following: SELECT ID FROM thisTable WHERE COLOR='yellow' AND TASTE='SWEET'

But my problem is I am doing this for multiple categories in my site, and the columns are NOT consistent. (like if the table is for handphones, then it will be BRAND, 3G-ENABLED, PRICE, COLOR, WAVELENGTH, etc)

how could I design a genera开发者_如何学编程l schema that allows this?

Right now I am planning on doing:

table(ID, KEY, VALUE)

This allows arbitary number of columns, but for the query, I am using SELECT ID FROM table WHERE (KEY=X1 AND VALUE=V1) AND (KEY=X2 AND VALUE=V2), .. which returns an empty set.

Can someone recommend a good solution to this? Note that the number of columns WILL change regularly


The entity-attribute-value model that you suggest could fit in this scenario.

Regarding the filtering query, you have to understand that with the EAV model you will sacrifice plenty of query power, so this can become quite tricky. However this one way to tackle your problem:

SELECT    stuff.id 
FROM      stuff 
JOIN      (SELECT    COUNT(*) matches
           FROM      table
           WHERE     (`key` = X1 AND `value` = V1) OR 
                     (`key` = X2 AND `value` = V2) 
           GROUP BY  id
          ) sub_t ON (sub_t.matches = 2 AND sub_t.id = stuff.id)
GROUP BY  stuff.id;

One inelegant feature of this approach is that you need to specify the number of attribute/value pairs that you expect to match in sub_t.matches = 2. If we had three conditions we would have had to specify sub_t.matches = 3, and so on.

Let's build a test case:

CREATE TABLE stuff (`id` varchar(20), `key` varchar(20), `value` varchar(20));

INSERT INTO stuff VALUES ('apple',  'color',  'red');
INSERT INTO stuff VALUES ('mango',  'color',  'yellow');
INSERT INTO stuff VALUES ('banana', 'color',  'yellow');

INSERT INTO stuff VALUES ('apple',  'taste',  'sweet');
INSERT INTO stuff VALUES ('mango',  'taste',  'sweet');
INSERT INTO stuff VALUES ('banana', 'taste',  'bitter-sweet');

INSERT INTO stuff VALUES ('apple',  'origin',  'US');
INSERT INTO stuff VALUES ('mango',  'origin',  'MEXICO');
INSERT INTO stuff VALUES ('banana', 'origin',  'US');

Query:

SELECT    stuff.id 
FROM      stuff 
JOIN      (SELECT    COUNT(*) matches, id
           FROM      stuff
           WHERE     (`key` = 'color' AND `value` = 'yellow') OR 
                     (`key` = 'taste' AND `value` = 'sweet')
           GROUP BY  id
          ) sub_t ON (sub_t.matches = 2 AND sub_t.id = stuff.id)
GROUP BY  stuff.id;

Result:

+-------+
| id    |
+-------+
| mango |
+-------+
1 row in set (0.02 sec)

Now let's insert another fruit with color=yellow and taste=sweet:

INSERT INTO stuff VALUES ('pear', 'color', 'yellow');
INSERT INTO stuff VALUES ('pear', 'taste', 'sweet');
INSERT INTO stuff VALUES ('pear', 'origin', 'somewhere');

The same query would return:

+-------+
| id    |
+-------+
| mango |
| pear  |
+-------+
2 rows in set (0.00 sec)

If we want to restrict this result to entities with origin=MEXICO, we would have to add another OR condition and check for sub_t.matches = 3 instead of 2.

SELECT    stuff.id 
FROM      stuff 
JOIN      (SELECT    COUNT(*) matches, id
           FROM      stuff
           WHERE     (`key` = 'color' AND `value` = 'yellow') OR 
                     (`key` = 'taste' AND `value` = 'sweet') OR 
                     (`key` = 'origin' AND `value` = 'MEXICO')
           GROUP BY  id
          ) sub_t ON (sub_t.matches = 3 AND sub_t.id = stuff.id)
GROUP BY  stuff.id;

Result:

+-------+
| id    |
+-------+
| mango |
+-------+
1 row in set (0.00 sec)

As in every approach, there are certain advantages and disadvantages when using the EAV model. Make sure you research the topic extensively in the context of your application. You may even want to consider an alternative relational databases, such as Cassandra, CouchDB, MongoDB, Voldemort, HBase, SimpleDB or other key-value stores.


The following worked for me:

SELECT * FROM mytable t WHERE 
    t.key = "key" AND t.value = "value" OR
    t.key = "key" AND t.value = "value" OR
    ....
    t.key = "key" AND t.value = "value"
GROUP BY t.id having count(*)=3;

count(*)=3 must match the amount of

t.key = "key" AND t.value = "value"

cases


What you are suggesting is known as an Entity-Attribute-Value structure and is highly discouraged. One of the (many) big problems with EAV designs for example is in data integrity. How you do enforce that colors only consist of "red", "yellow", "blue" etc? In short, you can't without a lot of hacks. Another problem rears itself in querying (as you have seen) and searching for data.

Instead, I would recommend creating a table that represents each type of entity and thus each table can have attributes (columns) that are specific to that type of entity.

In order to convert the data into columns in a result query as you are seeking, you will need to create what is often called a crosstab query. There are report engines that will do it and you can do it code but most database products will not do it natively (meaning without building the SQL string manually). The performance of course will not be good if you have a lot of data and you will run into problems filtering on the data. For example, suppose that some of the values are supposed to be numeric. Because the value part of the EAV is likely to be a string, you will have to cast those values to an integer before you can filter on them and that presumes that the data will be convertible to an integer.


The price you pay for simplistic table design at this stage will cost you in terms of performance in the long run. Using ORM to reduce the cost of modifying the database to fit data in an appropriate structure would probably be a good time investment, even in spite of ORM's performance cost.

Otherwise, you may want to look for a "reverse ORM" that maps the code from your database, which has the benefit of being less expensive and having higher performance. (Slightly higher starting cost compared to ORM, but better long-term performance and reliability.)

It's a costly problem regardless of how you slice it. Do you want to pay now with development time or pay later when your performance tanks? ("Pay later" is the wrong answer.)

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜