开发者

Clickhouse bloom filter index seems too slow

I had executed the following query but it has processed ~1B rows and took total time of 75 seconds for a simple count.

SELECT count(*)
FROM events_distributed
WHERE (orgId = '174a4727-1116-4c5c-8234-ab76f2406c4a') AND (timestamp >= '2022-12-05 00:00:00.000000000')

Query id: e4312ff5-6add-4757-8deb-d68e0f3e29d9

┌──count()─┐
│ 13071204 │
└──────────┘

1 row in set. Elapsed: 74.951 sec. Processed 979.00 million rows, 8.26 GB (13.06 million rows/s., 110.16 MB/s.)

I am wondering how I can speed this up? My events table has the following partition by and order by columns and a bloom filter index on orgid

PARTITION BY toDate(timestamp)
ORDER BY (timestamp);
INDEX idx_orgid orgid TYPE bloom_filter(0.01) GRANULARITY 1,

Below is the execution plan

EXPLAIN indexes = 1
SELECT count(*)
FROM events_distributed
WHERE (orgid = '174a4727-1116-4c5c-8234-ab76f240fc4a') AND (timestamp >= '2022-12-05 00:00:00.000000000') AND (timestamp <= '2022-12-06 00:00:00.000000000')

Query id: 879c2ce5-c4c7-4efc-b0e2-25613848afad

┌─explain────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ Expression ((Projection + Before ORDER BY))                                                                │
│   MergingAggregated                                                                                        │
│     Union                                                                                                  │
│       Aggregating                                                                                          │
│         Expression (Before GROUP BY)                                                                       │
│           Filter (WHERE)                                                                                   │
│             ReadFromMergeTree (users.events)                                                               │
│             Indexes:                                                                                       │
│               MinMax                                                                                       │
│                 Keys:                                                                                      │
│                   timestamp                                                                                │
│                 Condition: and((timestamp in (-Inf, '1670284800']), (timestamp in ['1670198400', +Inf)))   │
│                 Parts: 12/342     开发者_如何学JAVA                                                                         │
│                 Granules: 42122/407615                                                                     │
│               Partition                                                                                    │
│                 Keys:                                                                                      │
│                   toDate(timestamp)                                                                        │
│                 Condition: and((toDate(timestamp) in (-Inf, 19332]), (toDate(timestamp) in [19331, +Inf))) │
│                 Parts: 12/12                                                                               │
│                 Granules: 42122/42122                                                                      │
│               PrimaryKey                                                                                   │
│                 Keys:                                                                                      │
│                   timestamp                                                                                │
│                 Condition: and((timestamp in (-Inf, '1670284800']), (timestamp in ['1670198400', +Inf)))   │
│                 Parts: 12/12                                                                               │
│                 Granules: 30696/42122                                                                      │
│               Skip                                                                                         │
│                 Name: idx_orgid                                                                        │
│                 Description: bloom_filter GRANULARITY 1                                                    │
│                 Parts: 8/12                                                                                │
│                 Granules: 20556/30696                                                                      │
│       ReadFromRemote (Read from remote replica)                                                            │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

32 rows in set. Elapsed: 0.129 sec. 

How can I speed up this query? because processing 1B rows to give a count of 13M sounds like something is total off. Does creating a SET index on orgid any better? because I will have a max of 10K orgs


You don't use primary index

I suggest is to use

PARTITION BY toDate(timestamp)
ORDER BY (orgId, timestamp)

https://kb.altinity.com/engines/mergetree-table-engine-family/pick-keys/

And remove bloom_filter index.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜