Index on a table you must fully scan anyway? (MySQL)

2023-02-03 07:42 问答作者：

I'm pretty stumped here.

I have 2 tables and I'm left joining the first (around 500k records) with the second (around 2.2 million records) in order to find out which records are in the first and not the second. (typical "b.attribute is null开发者_C百科" nonsense)

Why (how) is it that an index is utilized on the first table? It's going to have to go through EVERY record in the first table anyway, and yet when I try to do this join without any index (or primary key.. none needed because this is all just ETL) on the first table, it crawls.

using innodb, by the way.

Help?

EDIT : the 2nd table is indexed. The first wasn't.

This should shed some light on it: http://dev.mysql.com/doc/refman/5.5/en/innodb-index-types.html

In short: All InnoDB tables have so called 'clustered index' (even if no explicit index is defined on the table, InnoDB creates it automatically), in which actual rows are stored.

I have no idea if this is what is happening, but it would, in theory, be possible (depending on the actual query) for the database engine to be scanning the index for the left table rather than the table itself. It could construct the necessary key data for that. If scanning the index were faster than scanning the table, that could account for the speed difference.

The purpose of the primary index is put things in order by sorting and creating a big tree (at least in SQL Server). B-tree if to be more specific. This means each record's key belongs to some place (or bucket) in the tree.

Index on a table you must fully scan anyway? (MySQL)

So why adding a key to the FIRST table helps to speed up the query? The reason is that when the query is executed, the FIRST table is being sorted since the SECOND table is already sorted due to presence of a primary key. This is due to the simple fact that comparing two sorted lists is much faster than doing binary search for each element. In this case, since there is no index, sorting takes time.

By the way, don't be confused by what I say. It's not really comparing lists, but more pruning the index tree on the above picture, e.g. if the T1 has K1, K2, K3 and K1 can be found in second bucket on the picture then there is no need to check first bucket for the rest of the keys.

MySQL doesn't have hash joins.

继续阅读：database-agnostic indexing

Index on a table you must fully scan anyway? (MySQL)

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？