Beginner to sqlite trying to improve performance through database design, 'join' and 'where'

2023-03-26 03:31 问答作者：

I have a large table (potentially millions of rows) that in its naive form would contain lots of repetition, e.g.:

CREATE TABLE sales(
    id          INTEGER PRIMARY KEY AUTOINCREMENT,
    salesperson TEXT,
    customer    TEXT
);

INSERT INTO sales VALUES(NULL, "Rod", "Acme");
INSERT INTO sales VALUES(NULL, "Jane", "Xyz Corp");
INSERT INTO sales VALUES(NULL, "Freddy", "Acme");
<... many more lines containing significant repetitions of each salesperson and each customer >

I think I know enough that this is not the ideal approach when I want to perform queries such as:

SELECT count(*) FROM sales WHERE salesperson="Jane";

So my database should be rearranged as:

CREATE TABLE sales(
    id          INTEGER PRIMARY KEY AUTOINCREMENT,
    salesperson INTEGER,
    customer    INTEGER
);

CREATE TABLE salespeople(
    id          INTEGER PRIMARY KEY AUTOINCREMENT,
    name        TEXT
);

CREATE TABLE customers(
    id          INTEGER PRIMARY KEY AUTOINCREMENT,
    name        TEXT
);

In reality there are many more columns in the single-table solution, and therefore more distinct tables in the normalized (if that's even the right use of the term) version.

I've timed tests of a dumb query against the single-table v.s. a JOIN with a WHERE aga开发者_如何学Pythoninst the split-tables version and the "smart" version is about 25% slower, so clearly I'm missing something:

SELECT count(*) FROM sales INNER JOIN salespeople ON sales.salesperson=salespeople.id WHERE salespeople.name="Freddy";

If I lookup the salesperson.id first as a separate query, I see times ~33% faster than the dumb implementation (which is less than I was hoping for).

SELECT id FROM salespeople WHERE name="Freddy";

...then

SELECT count(*) FROM sales WHERE salesperson=previouslyLookedUpId;

I get the impression that the WHERE (in the multi-table solution) is being evaluated for every row, rather than once to determine the relevant salesperson id, as intended. Clearly my queries (or constraints, or indices?) aren't allowing the db engine to work efficiently.

I only know enough to know that I don't know enough... What is the right approach here?

First, see my comment about indexes.

Second, the goal of "normalizing" a database design is not performance. Often it is the case that a normalized design will perform more slowly than a denormalized or "flattened" table. Joins can be expensive in terms of resource consumption.

The goal of normalizing is the proper representation of the entities and their relationships; if your domain is accurately mapped to a normalized design it is impossible to get different answers to the same question. That's the goal.

Also, a normalized design is extensible. That is another major virtue and goal.

Since you use the word "smart". A normalized design is "simple-minded", not smart, though it takes smarts to attain simplicity.

P.S.

What kind of performance do you get, relative to a standard join, with this approach:

                 from sales
                 where salespersonid = (select id from SALESPEOPLE where name = 'Joe')

when SALESPEOPLE.name and salespersonid are indexed?

or this:

                ... from sales
                inner join
                (select id from SALESPEOPLE where name = 'Joe') as MYPEEPS
                on sales.salespersonid = MYPEEPS.id

继续阅读：join

Beginner to sqlite trying to improve performance through database design, 'join' and 'where'

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？