SQLite join optimisation

2023-01-24 22:59 问答作者：

If you have a query such as:

select a.Name, a.Description from a
inner join b on a.id1 = b.id1
inner join c on b.id2 = c.id2
group by a.Name, a.Description

What would be the most optimal columns to index for this query in SQLite if you consider that there are over 100,000 rows in开发者_如何学Python each of the tables?

The reason that I ask is that I do not get the performance with the query with the group by that I would expect from another RDBMS (SQL Server) when I apply the same optimisation.

Would I be right in thinking that all columns referenced on a single table in a query in SQLite need to be included in a single composite index for best performance?

The problem is that you're expecting SQLite to have the same performance characteristics as a full RDBMS. It won't. SQLLite doesn't have the luxury of getting to cache quite as much in memory, has to rebuild the cache every time you run the application, is probably limited to set number of cores, etc, etc, etc. Tradeoffs for using an embedded RDBMS over a full one.

As far as optimizations go, try indexing the lookup columns and test. Then try creating a covering index. Be sure to test both selects and code paths that update the database, you're speeding up one at the expense of the other. Find the indexing that gives the best balance between the two for your needs and go with it.

From the SQLite query optimization overview:

When doing an indexed lookup of a row, the usual procedure is to do a binary search on the index to find the index entry, then extract the rowid from the index and use that rowid to do a binary search on the original table. Thus a typical indexed lookup involves two binary searches. If, however, all columns that were to be fetched from the table are already available in the index itself, SQLite will use the values contained in the index and will never look up the original table row. This saves one binary search for each row and can make many queries run twice as fast.

For any other RDBMS, I'd say to put a clustered index on b.id1 and c.id2. For SQLite, you might be better off including any columns from b and c that you want to lookup in those indexes too.

Beware: I know nothing of possible intricacies of SQLite and its execution plans.

You definitely need indexes on a.id1, b.id1, b.id2 and c.id2. I think a composite index (b.id1, b.id2) could yield a small performance increase. The same goes for (a.id1, a.Name, a.Description).

Since you're not using the other tables for your return columns, perhaps this will be faster:

SELECT DISTINCT a.Name, a.Description
FROM a, b, c
WHERE a.id1 = b.id1
AND b.id2 = c.id2

Looking at the returned columns, since the criteria seems to be only that they must be linked from a to b to c, you could look for all unique a.Name and a.Description pairs.

SELECT DISTINCT a.Name, a.Description
FROM a
WHERE a.id1 IN (
 SELECT b.id1
 FROM b
 WHERE b.id2 IN (
  SELECT c.id2
  FROM c
  )
 )

Or, depending on if every pair of a.Name and a.Description is already unique, there should be some gain in finding out first the unique id's then fetching the other columns.

SELECT a.Name, a.Description
FROM a 
WHERE a.id1 IN (
 SELECT DISTINCT a.id1
 FROM a
 WHERE a.id1 IN (
  SELECT b.id1
  FROM b
  WHERE b.id2 IN (
   SELECT c.id2
   FROM c
   )
  )
 )

I think indexes on a.id1 and b.id2 would give you about as much benefit as you could get in terms of the JOINs. But SQLite offers EXPLAIN, and it might help you determine if there's an avoidable in efficiency in the current execution plan.

继续阅读：indexing query-optimization sql

SQLite join optimisation

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？