Speeding up SQL query when sorting on foreign keys

2022-12-10 06:18 问答作者：

This is more of a generic SQL problem but I'm using Firebird 2.5 if anyone knows of a Firebird/Interbase specific optimization. First, the following is a simplified example schema to illustrate the issue I'm trying to solve:

CREATE TABLE users
(
   id INTEGER PRIMARY KEY,
   name VARCHAR(16)
);

CREATE TABLE data_set
(
   id INTEGER PRIMARY KEY,
   name VARCHAR(64)
);

CREATE UNIQUE INDEX data_set_name_idx ON data_set(name);

CREATE TABLE data
(
   user_id INTEGER,
   data_set_id INTEGER,
   data BLOB,
   PRIMARY KEY(user_id, data_set_id)
);

CREATE INDEX data_user_id_idx ON data(user_id);
CREATE INDEX data_data_set_id_idx ON data(data_set_id);

The query I'm trying to run is as follows:

SELECT users.name, data_set.name, data FROM users, data_set, data
WHERE user_id=XXX AND user_id=users.id AND data_set_id=data_set.id
ORDER BY data_set.name;

With 'XXX' being filled in with the user_id I want. So what I'm doing is selecting all the rows from the data table that are owned by a particular user and I'm sorting the results b开发者_如何学运维ased on the data_set name.

This works as it is but the problem is the data table has over a billion rows in it and the data_set table is not small either. The result set for a single user id may be many hundreds of millions of rows. What happens is that in order for the ORDER BY to work the database has to create a massive amount of temporary data which is incredibly slow and uses a lot of disk space. Without the ORDER BY it's fast but obviously not sorted like I need.

One solution would be to take the data_set.name values and just put them in a varchar column in data. Then that could be indexed and would be quick to sort. The problem with this approach is that it will have a lot of duplicate data and make the database absolutely massive.

Another solution would be something like an Indexed View or an indexed Computed Column. As far as I know neither of those is supported by Firebird.

Any other ideas?

This is rather speculative, but I wonder if this could be restructured as:

A cartesian product between users and data set, including the predicate on user.
Order by data set name
Joining to data

... would be more efficient, especially if you were only interested in the forst rows of the query.

In Oracle I'd think that it would not be because the nested loop join would be much less efficient than a hash join, but I'm not familiar with firebird at all I'm afraid.

Why not index data_set.name?

Also, I would skip primary key definition for facts table (data) and put two separate indexes for foreign keys, to speed up joins. (of course, indexing may have impact on insertions, if you have a large number of records to be inserted).

If you need to ensure unique constraint of the facts table, you could do it from the job transferring data into that table (I don't have details about that :).

Try to define an index on data_set(id, name) and experiment with it - maybe in combination with other suggestions here. You can change your existing UNIQUE index to a UNIQUE CONSTRAINT if your requirements dictate it and Firebird supports unique constraints.

继续阅读：firebird firebird2.5 foreign-keys query-optimization sql

Speeding up SQL query when sorting on foreign keys

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？