Querying multiple SQL tables for one type of objects
Given the following SQL table (MySQL for instance):
CREATE TABLE `table` (
`id` int(11) unsigned NOT NULL,
`lang` tinyint(3) unsigned NOT NULL,
`data` text NOT NULL,
PRIMARY KEY (`id`,`lang`)
) ENGINE=InnoDB
This table stores some data about objects with IDs id
and data
can be written in several languages = lang
. Typical use case for this table is: we need to get data for some object with some id and where language is lang = 1 or at least lang = 5 or any other language if no rows with lang = 1 or 5 for id = 1 is found.
In other words, I want to get information about object with id = 1 on English or at least on Deutsch, but if not - any other language will be sufficient.
This is a rather simple query:
SELECT * FROM `table` WHERE `id` = 1
ORDER BY
CASE WHEN `lang` = 1 THEN 1
WHE开发者_JS百科N `lang` = 2 THEN 2
ELSE 3
END ASC
LIMIT 1
This query is quite fast and use only PRIMARY key with in-memory sorting.
The questions appears when we'd like to get such data for several objects with one query. The only one I can think of is something like:
SELECT id, (SUBQUERY TO GET DATA AS ABOVE WHERE id = tmp.id LIMIT 1) AS data
FROM (SUBQUERY TO SELECT ids) as tmp
That query will do it's job but it looks and feels ugly :/
And this is a first question: is a good and right way to do such things? Do anybody knows a better way to resolve such problems?
Now let's think about highload and really big tables of data. For example, let's assume that we gave 1,000,000 of objects with 5 to 15 languages for each. This is really huge table for MySQL so we will split one table for several more (say 20 tables across several servers). Now we have some simple hash-function (like id % N == 0) to know where the data for particular object is stored.
So, question #2: *How to make such requests across several tables (even in one database, from table_1 to table_5) if we already know where the data is?* I guess that this question can be answered only after first one :(
Several other questions about this topic: maybe the whole situation is wrong? Should we store such data some other way? Or maybe there are some other, more efficient ways to do it?
You suggest
SELECT id, (SUBQUERY TO GET DATA AS ABOVE WHERE id = tmp.id LIMIT 1) AS data
FROM (SUBQUERY TO SELECT ids) as tmp
A small-ish change, but surely it's a bit better to join to a resultset telling you the best language for each desired row.
SELECT
`table`.`id`,
`table`.`lang`,
`table`.`data`
FROM
`table`
JOIN (
SELECT
`id`,
MAX(`evaluatelanguage`(`lang`)) AS `bestscore`
FROM
`table` AS `sqtable`
GROUP BY
`id`
) AS `subquery` ON
`table`.`id` = `subquery`.`id` AND
`evaluatelanguage`(`table`.`lang`) = `subquery`.`bestscore`
NB. This version requires you to give a separate rank to every language, because otherwise you will see multiple rows where an id falls into the ELSE
clause in your CASE
. I think this query can be improved (surely we need not use
so many times?) but I'm unsure how best to go about it.evaluatelanguage
精彩评论