开发者

Querying multiple SQL tables for one type of objects

Given the following SQL table (MySQL for instance):

CREATE TABLE `table` (
  `id` int(11) unsigned NOT NULL,
  `lang` tinyint(3) unsigned NOT NULL,
  `data` text NOT NULL,
  PRIMARY KEY (`id`,`lang`)
) ENGINE=InnoDB

This table stores some data about objects with IDs id and data can be written in several languages = lang. Typical use case for this table is: we need to get data for some object with some id and where language is lang = 1 or at least lang = 5 or any other language if no rows with lang = 1 or 5 for id = 1 is found.

In other words, I want to get information about object with id = 1 on English or at least on Deutsch, but if not - any other language will be sufficient.

This is a rather simple query:

SELECT * FROM `table` WHERE `id` = 1
ORDER BY
  CASE WHEN `lang` = 1 THEN 1
       WHE开发者_JS百科N `lang` = 2 THEN 2
       ELSE 3
  END ASC
LIMIT 1

This query is quite fast and use only PRIMARY key with in-memory sorting.

The questions appears when we'd like to get such data for several objects with one query. The only one I can think of is something like:

SELECT id, (SUBQUERY TO GET DATA AS ABOVE WHERE id = tmp.id LIMIT 1) AS data
FROM (SUBQUERY TO SELECT ids) as tmp

That query will do it's job but it looks and feels ugly :/

And this is a first question: is a good and right way to do such things? Do anybody knows a better way to resolve such problems?

Now let's think about highload and really big tables of data. For example, let's assume that we gave 1,000,000 of objects with 5 to 15 languages for each. This is really huge table for MySQL so we will split one table for several more (say 20 tables across several servers). Now we have some simple hash-function (like id % N == 0) to know where the data for particular object is stored.

So, question #2: *How to make such requests across several tables (even in one database, from table_1 to table_5) if we already know where the data is?* I guess that this question can be answered only after first one :(

Several other questions about this topic: maybe the whole situation is wrong? Should we store such data some other way? Or maybe there are some other, more efficient ways to do it?


You suggest

SELECT id, (SUBQUERY TO GET DATA AS ABOVE WHERE id = tmp.id LIMIT 1) AS data
FROM (SUBQUERY TO SELECT ids) as tmp

A small-ish change, but surely it's a bit better to join to a resultset telling you the best language for each desired row.

SELECT
    `table`.`id`,
    `table`.`lang`,
    `table`.`data`
FROM
    `table`
    JOIN (
        SELECT
            `id`,
            MAX(`evaluatelanguage`(`lang`)) AS `bestscore`
        FROM
            `table` AS `sqtable`
        GROUP BY
            `id`
    ) AS `subquery` ON
        `table`.`id` = `subquery`.`id` AND
        `evaluatelanguage`(`table`.`lang`) = `subquery`.`bestscore`

NB. This version requires you to give a separate rank to every language, because otherwise you will see multiple rows where an id falls into the ELSE clause in your CASE. I think this query can be improved (surely we need not use evaluatelanguage so many times?) but I'm unsure how best to go about it.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜