NULL-keys for key/value table
(Using Oracle)
I have a table with key/value pairs like this:
create table MESSAGE_INDEX
(
KEY VARCHAR2(256) not null,
VALUE VARCHAR2(4000) not null,
MESSAGE_ID NUMBER not null
)
I now want to find all the messages where key = 'someKey' and value is 'val1', 'val2' or 'val3' - OR value is null in which case there will be no entry in the table at all. This is to save space; there would be a large number of keys with null values if I stored them all.
I think this works:
SELECT message_id
FROM message_index idx
WHERE ((key = 'someKey' AND value IN ('val1', 'val2', 'val3'))
OR NOT EXISTS (SELECT 1 FROM message_index WHERE key = 'someKey'
AND idx.message_id = message_id))
But is is extremely slow. Takes 8 seconds with 700K records in message_index and there will be many more records and more search criteria when moving outside of my test environment.
Primary key is key, value, message_id:
add constraint PK_KEY_VALUE primary key (KEY, VALUE, MESSAGE_ID)
And I added another index for message_id, to speed up searching for missing keys:
create index IDX_MESSAGE_ID on MESSAGE_INDEX (MESSAGE_ID)
I will be doing several of these key/value lookups in every search, not just one as shown above. So far I am doing them nested, where output id's of one level is the input to the next. E.g.:
SELECT message_id from message_index
WHERE (key/value compare)
AND message_id IN
(
SELECT ... and开发者_如何学编程 so on
)
What can I do to speed this up?
"What can I do to speed this up?"
Use a normalized data model rather than a key-value store. Reconstructing the (especially optional) attributes of a message is going to be a continual performance bugbear.
If you have a key that all messages are guaranteed to have:
SELECT message_id
FROM message_index mi
WHERE mi.key = 'GuaranteedKey'
AND mi.message_id IN
(
SELECT message_id
FROM message_index mk
WHERE mk.key = 'someKey'
AND mk.value IN (1, 2, 3)
)
UNION ALL
SELECT message_id
FROM message_index mi
WHERE mi.key = 'GuaranteedKey'
AND mi.message_id NOT IN
(
SELECT message_id
FROM message_index mk
WHERE mk.key = 'someKey'
)
If you don't:
WITH mi AS
(
SELECT DISTINCT message_id
FROM message_index
)
SELECT message_id
FROM mi
WHERE mi.message_id IN
(
SELECT message_id
FROM message_index mk
WHERE mk.key = 'someKey'
AND mk.value IN (1, 2, 3)
)
UNION ALL
SELECT message_id
FROM mi
WHERE mi.message_id NOT IN
(
SELECT message_id
FROM message_index mk
WHERE mk.key = 'someKey'
)
To speed this up you convert subselects into joins, so your query would become something like this:
SELECT idx.message_id
FROM message_index idx
LEFT JOIN message_index idx2 ON idx2.message_id = idx.message_id AND idx2.key = 'someKey'
WHERE (idx.key = 'someKey' AND idx.value IN ('val1', 'val2', 'val3'))
OR idx2.message_id IS NULL
I'm not sure your second filter is what you are looking for. Basically the subquery:
(SELECT 1
FROM message_index
WHERE key = 'someKey'
AND idx.message_id = message_id)
won't contain rows only if there is no key 'someKey'
for that message_id
in the table.
If this is really what you want, and since all columns are NOT NULL, you could rewrite the query with a NOT IN (that will probably be optimized into an HASH ANTI-JOIN):
SELECT message_id
FROM message_index idx
WHERE (key = 'someKey' AND VALUE IN ('val1', 'val2', 'val3'))
UNION ALL
SELECT message_id
FROM message_index
WHERE message_id NOT IN (SELECT message_id
FROM message_index
WHERE key = 'someKey');
精彩评论