开发者

Multivalued attributes in Relational databases?

How well an idea are multi-valued attributes in a relational database when they are to be referred extensively?

Let me give you an example to show what I mean. Suppose I have the following table:

UserID          Attribute1

User1           a,b,c
User2           x,y,z
User3           a,x,y
User4           c,b,z
[a,b,c,x,y,z are to be strings]

There is another user User5 to whom I have to make some suggestions about other users based on whether his Attribute1 matches any one of other 4 users or not.

[In graph databases, the task could have been much easier as I could have created multiple nodes from the respective users using the same relationship.]

Now, this table is just a micro-level abstraction of what an actual database will look like. The number of rows in a table may run into hu开发者_StackOverflow社区ndreds of thousands, if not millions. Also, the multiple values may actually be a lot more than 3. Apart from this, the database can be under heavy load, and in that situation, there may be some issues.

So, are multi-valued attributes helpful in such cases? Or is there any better way of doing the same? One obvious way I can think of is to store it as:

UserID          Attribute1

User1           a
User1           b
User1           c
User2           x
User2           y
User2           z
User3           a
User3           x
User3           y
User4           c
User4           b
User4           z

Any faster way of dealing such situations in databases? Or are there any built-in features of modern-day databases to exploit?


Having multiple values in a field is only useful if the data is dead weight in the database, i.e. if you only read the field out of the database and process it afterwards.

As soon as you want to use the values in the field in a query, you will take a huge performance hit from having to parse the value to compare it. If you put the values in separate records as in your second example, so that you can add an index on it, it's not unrealistic that the query will be 10 000 times faster.

Having a million records in a table is not a problem. We have some tables that have over 100 million records in them.


Apart from what the others have said regarding normalization, I'd like to answer to the "Or any inbuilt feature of modern-day databses to exploit?" part of your question:

PostgreSQL has a pretty nifty extension called hstore which does exactly that and in a highly optimized manner.

The hstore data type is essentially a key/value pair, where you can store anything. In your example something like this:

INSERT INTO user_attributes
(user_id, , attributes)
VALUES
(1, ('att1 => x, att2 => y'));

Will insert the keys att1 and att2 into the column attributes. This can be indexed to make lookups fast.

You can query the data using this syntax:

SELECT *
FROM user_attributes
WHERE attributes @> ('att1 => "Some Value"')

This will return all rows that have a key named att1 and where that is mapped to the value "Some Value". The above statement will use an existing index on the column, so the lookup is nearly as fast as with a "real" column. The above statement takes ~2ms on my laptop to find a row in a table with 100.000 rows.

You can also query for rows that have a specific attribute defined regardless of the value:

SELECT user_id,
       (attributes -> 'att1')
FROM user_attributes
WHERE attributes ? 'att1'

will find all rows where att1 is defined and will output the value for those.


For a n-n table you could normalize it to 3 tables (in a transactional model) users - user_attribute - attributes where the user_attribute table consists out of the primary key of users and attributes.. Keys are usually indexed and therefore quite fast for read ops

EDIT AFTER QUESTION

Users
int Id PrimaryKey
string name

User_Attribute
UserId PrimaryKey (FK to Users.Id)
AttributeId PrimaryKey (FK to Attributes.Id)

Attributes
int Id PrimaryKey
Value

this would result in a table holding only the users, a table holding only the attributes and a table holding which user is holding what

for instance

   Users      User_Attribute      Attrubutes      
id  Name   UserId AttributeId  Id Value
1   User1  1      1            1  Att1
2   User2  1      2            2  Att2
           2      1            3  Att3  
           2      3
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜