PostgreSQL 10分区表及性能测试报告小结
目录
- 一、 测试环境
- 二、 编译安装PostgreSQL 10
- range分区表
- list分区表
- 多级分区表
- 使用ALTER TABLE xxx ATTACH[DETACH] PARTITION 增加或删除分区
- 添加外部表作为分区表
- 四、建立测试业务表
- 五、性能测试
- 数据导入
- 查询某个时间范围的数据
- 查询某个月里某个用户数据--直接从cache里取数据
- 索引维护
- 删除整个分区数据
- 全表扫描
- 增加新的分区并导入数据
作者简介:
中国比较早的postgresql使用者,2001年就开始使用postgresql,自2003年底至2014年一直担任PGSQL中国社区论坛PostgreSQL的论坛板块版主、管理员,参与Postgresql讨论和发表专题文章7000多贴.拥有15年的erp设计,开发和实施经验,开源mrp系统PostMRP就是我的作品,该应用软件是一套基于Postgresql专业的制造业管理软件系统.目前任职于--中国第一物流控股有限公司/运力宝(北京)科技有限公司,为公司的研发部经理
一、 测试环境
操作系统:CentOS 6.4
Postgresql版本号:10.0
CPU:Intel(R) Xeon(R) CPU E5-2407 v2 @ 2.40GHz 4核心 4线程
内存:32G
硬盘:2T SAS 7200
二、 编译安装PostgreSQL 10
--编译安装及初始化
[root@ad source]# git clone git://git.postgresql.org/git/postgresql.git [root@ad source]# cd postgresql [root@ad source]# ./configure --prefix=/usr/local/pgsql10 [root@ad postgresql]# gmake -j 4 [root@ad postgresql]# gmake install [root@ad postgresql]# su postgres [postgres@ad postgresql]# /usr/local/pgsql10/bin/initdb --no-locale -E utf8 -D /home/postgres/data10/ -U postgres
--修改一些参数
postgresql.conf listen_addresses = '*' port = 10000 shared_buffers = 8096MB maintenance_work_mem = 512MB effective_cache_size = 30GB log_destination = 'csvlog' logging_collector = on log_directory = 'log' log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log' log_file_mode = 0600 log_checkpoints = off log_connections = off log_disconnections = off log_duration = off log_line_prefix = '%m %h %a %u %d %x [%p] ' log_statement = 'none' log_timezone = 'PRC' track_activity_query_size = 4096 max_wal_size = 32GB min_wal_size = 2GB checkpoint_completion_target = 0.5
pg_hba.conf增加许可条目
host all all 192.168.1.0/24 trust
--启动服务
[postgres@ad data10]$ /usr/local/pgsql10/bin/pg_ctl start -D /home/postgres/data10/ --连接数据库 [postgres@ad data10]$ /usr/local/pgsql10/bin/psql -p 10000 -U postgres -h 127.0.0.1 -d postgres psql (10devel) Type "help" for help. postgres=#
PostgreSQL的分区表跟先前版本一样,也要先建立主表,然后再建立子表,使用继承的特性,但不需要手工写规则了,这个比较赞阿。目前支持range、list分区,10正式版本发布时不知会不会支持其它方法。
range分区表
1、分区主表
create table order_range(id bigserial not null,userid integer,product text, createdate date) partition by range ( createdate );
分区主表不能建立全局约束,使用partition by range(xxx)说明分区的方式,xxx可以是多个字段,表达式……,具体见https://www.postgresql.org/docs/devel/static/sql-createtable.html
2、分区子表
create table order_range(ihttp://www.devze.comd bigserial not null,userid integer,product text, createdate date not null) partition by range ( createdate ); create table order_range_201701 partition of order_range(id primary key,userid,product, createdate) for values from ('2017-01-01') to ('2017-02-01'); create table order_range_201702 partition of order_range(id primary key,userid,product, createdate) for values from ('2017-02-01') to ('2017-03-01');
- 说明:
- 建立分区表时必需指定主表。
- 分区表和主表的 列数量,定义 必须完全一致。
- 分区表的列可以单独增加Default值,或约束。
- 当用户向主表插入数据库时,系统自动路由到对应的分区,如果没有找到对应分区,则抛出错误。
- 指定分区约束的值(范围,LIST值),范围,LIST不能重叠,重叠的路由会卡壳。
- 指定分区的列必需设置成not null,如建立主表时没设置系统会自动加上。
- Range分区范围为 >=最小值 and <最大值……
- 不支持通过更新的方法把数据从一个区移动到另外一个区,这样做会报错。如果要这样做的话需要删除原来的记录,再INSERT一条新的记录。
- 修改主表的字段名,字段dhFOfqIjL类型时,会自动同时修改所有的分区。
- TRUNCATE 主表时,会清除所有继承表分区的记录,如果要清除单个分区,请对分区进行操作。
- DROP主表时会把所有子表一起给DROP掉,如果drop单个分区,请对分区进行操作。
- 使用psql能查看分区表的详细定义。
postgres=# \d+ order_range Table "public.order_range" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description ------------+---------+-----------+----------+-----------------------------------------+----------+--------------+------------- id | bigint | | not null | nextval('order_range_id_seq'::regclass) | plain | | userid | integer | | | | plain | | product | text | | | | extended | | createdate | date | | not null | | plain | | Partition key: RANGE (createdate) Partitions: order_range_201701 FOR VALUES FROM ('2017-01-01') TO ('2017-02-01'), order_range_201702 FOR VALUES FROM ('2017-02-01') TO ('2017-03-01') postgres=#
list分区表
1、分区主表
create table order_list(id bigserial not null,userid integer,product text,area text, createdate date) partition by list( area );
2、分区子表
create table order_list_gd partition of order_list(id primary key,userid,product,area,createdate) for values in ('广东'); create table order_list_bj partition of order_list(id primary key,userid,product,area,createdate) for values in ('北京');
多级分区表
先按地区分区,再按日期分区
1、主表
create table order_range_list(id bigserial not null,userid integer,product text,area text, createdate date) partition by list ( area );
2、一级分区表
create table order_range_list_gd partition of order_range_list for values in ('广东') partition by range(createdate); create table order_range_list_bj partition of order_range_list for values in ('北京') partition by range(createdate);
3、二级分区表
create table order_range_list_gd_201701 partition of order_range_list_gd(id primary key,userid,product,area,createdate) for values from ('2017-01-01') to ('2017-02-01'); create table order_range_list_gd_201702 partition of order_range_list_gd(id primary key,userid,product,area,createdate) for values from ('2017-02-01') to ('2017-03-01'); create table order_range_list_bj_201701 partition of order_range_list_bj(id primary key,userid,product,area,createdate) for values from ('2017-01-01') to ('2017-02-01'); create table order_range_list_bj_201702 partition of order_range_list_bj(id primary key,userid,product,area,createdate) for values from ('2017-02-01') to ('2017-03-01');
直接操作分区也要受分区规则的约束
postgres=# insert into order_range_201702 (id,userid,product,createdate) values(1,
(random()::numeric(7,6)*1000000)::integer,md5(random()::text),('2017-01-01')); ERROR: new row for relation "order_range_201702" violates partition constraintDETAIL: Failing row contains (1, 322345, 51a9357a78416d11a018949a42dd2f8d, 2017-01-01).
INSERT提示违反了分区约束
postgres=# update order_range_201701 set createdate='2017-02-01' where createdate='2017-01-17';
ERROR: new row for relation "order_range_201701" violates partition constraintDETAIL: Failing row contains (1, 163357, 7e8fbe7b632a54ba1ec401d969f3259a, 2017-02-01).UPDATE提示违反了分区约束
如果分区表是外部表,则约束失效,后面有介绍
使用ALTER TABLE xxx ATTACH[DETACH] PARTITION 增加或删除分区
1、移除分区
录入2条测试数据
postgres=# insert into order_range (userid,product,createdate) values((random()::numeric(7,6)*1000000)::integer,md5(random()::text),('2017-01-01'::date+ (random()*31)::integer)); INSERT 0 1 Time: 25.006 ms postgres=# insert into order_range (userid,product,createdate) values((random()::numeric(7,6)*1000000)::integer,md5(random()::text),('2017-01-01'::date+ (random()*31)::integer)); INSERT 0 1 Time: 7.601 ms postgres=# select * from order_range; id | userid | product | createdate ----+--------+----------------------------------+------------ 1 | 163357 | 7e8fbe7b632a54ba1ec401d969f3259a | 2017-01-17 2 | 349759 | 8095c9036295d3c800dace9069f9c102 | 2017-01-27 (2 rows)
删除分区
postgres=# alter table order_range detach partition order_range_201701; ALTER TABLE Time: 14.129 ms
查看确认分区没了
postgres=# \d+ order_range; Table "public.order_range" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description ------------+---------+-----------+----------+-----------------------------------------+----------+--------------+------------- id | bigint | | not null | nextval('order_range_id_seq'::regclass) | plain | | userid | integer | | | | plain | | product | text | | | | extended | | createdate | date | | not null | | plain | | Partition key: RANGE (createdate) Partitions: order_range_201702 FOR VALUES FROM ('2017-02-01') TO ('2017-03-01') postgres=#
数据也查不出来了
postgres=# select * from order_range; id | userid | product | createdate ----+--------+---------+------------ (0 rows) Time: 0.505 ms
但分区表还在
postgres=# select * from order_range_201701; id | userid | product | createdate ----+--------+----------------------------------+------------ 1 | 163357 | 7e8fbe7b632a54ba1ec401d969f3259a | 2017-01-17 2 | 349759 | 8095c9036295d3c800dace9069f9c102 | 2017-01-27 (2 rows) Time: 0.727 ms
2、添加分区
postgres=# alter table order_range attach partition order_range_201701 FOR VALUES FROM ('2017-01-01') TO ('2017-02-01'); ERROR: column "createdate" in child table must be marked NOT NULL Time: 0.564 ms
增加子表里,约束需要与主表一致
postgres=# alter table order_range_201701 alter column createdate set not null; ALTER TABLE Time: 17.345 ms postgres=# alter table order_range attach partition order_range_201701 FOR VALUES FROM ('2017-01-01') TO ('2017-01-15'); ERROR: partition constraint is violated by some row Time: 1.276 ms
加回来时可以修改其约束范围,但数据必需在约束的规则范围内
postgres=# alter table order_range attach partition order_range_201701 FOR VALUES FROM ('2017-01-01') TO ('2017-02-01'); ALTER TABLE Time: 18.407 ms
分区表又加回来了
postgres=# \d+ order_range Table "public.order_range" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description ------------+---------+-----------+----------+-----------------------------------------+----------+--------------+------------- id | bigint | | not null | nextval('order_range_id_seq'::regclass) | plain | | userid | integer | | | | plain | | product | text | | | | extended | | createdate | date | | not null | | plain | | Partition key: RANGE (createdate) Partitions: order_range_201701 FOR VALUES FROM ('2017-01-01') TO ('2017-02-01'), order_range_201702 FOR VALUES FROM ('2017-02-01') TO ('2017-03-01') postgres=# select * from order_range; id | userid | product | createdate ----+--------+----------------------------------+------------ 1 | 163357 | 7e8fbe7b632a54ba1ec401d969f3259a | 2017-01-17 2 | 349759 | 8095c9036295d3c800dace9069f9c102 | 2017-01-27 (2 rows) Time: 0.627 ms
添加外部表作为分区表
--增加一个新库,建立需要的外部表
[postgres@ad root]$ /usr/local/pgsql10/bin/psql -p 10000 -U postgres -h 127.0.0.1 -d postgres psql (10devel) Type "help" for help. #建立数据库 postgres=# create database postgres_fdw; CREATE DATABASE postgres_fdw=# create table order_range_fdw(id bigserial not null,userid integer,product text, createdate date not null); CREATE TABLE postgres_fdw=# #录入一条测试数据 postgres_fdw=# insert into order_range_fdw (userid,product,createdate) values((random()::numeric(7,6)*1000000)::integer,md5(random()::text),('2017-01-01'::date- (random()*31)::integer)); INSERT 0 1 postgres_fdw=# select * from order_range_fdw; id | userid | product | createdate ----+--------+----------------------------------+------------ 2 | 300686 | 55956a07742d6aebdef7ebb78c2400d7 | 2016-12-22 (1 row)
--在postgres库中增加外部表order_range_fdw
[postgres@ad root]$ /usr/local/pgsql10/bin/psql -p 10000 -U postgres -h 127.0.0.1 -d postgres psql (10devel) Type "help" for help. #增加postgres_fdw模块 postgres=# create extension postgres_fdw; CREATE EXTENSION #建立外部服务器 postgres=# CREATE SERVER foreign_server FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host '192.168.1.10', port '10000', dbname 'postgres_fdw'); CREATE SERVER #建立外部服务器用户标识 postgres=# CREATE USER MAPPING FOR postgres postgres-# SERVER foreign_server postgres-# OPTIONS (user 'postgres', password ''); CREATE USER MAPPING #建立外部表 postgres=# CREATE FOREIGN TABLE order_range_fdw ( postgres(# id bigint not null, postgres(# userid integer, postgres(# product text, postgres(# createdate date not null postgres(# ) postgres-# SERVER foreign_server postgres-# OPTIONS (schema_name 'public', table_name 'order_range_fdw'); CREATE FOREIGN TABLE #查询数据 postgres=# select * from order_range_fdw; id | userid | product | createdate ----+--------+----------------------------------+------------ 2 | 300686 | 55956a07742d6aebdef7ebb78c2400d7 | 2016-12-22 (1 row) --将外部表作为分区表添加到order_range下 #添加分区表 postgres=# alter table order_range attach partition order_range_fdw FOR VALUES FROM ('1900-01-01') TO ('2017-01-01'); ALTER TABLE #查看order_range下的所有分区表 postgres=# \d+ order_range Table "public.order_range" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description ------------+---------+-----------+----------+-----------------------------------------+----------+--------------+------------- id | bigint | | not null | nextval('order_range_id_seq'::regclass) | plain | | userid | integer | | | | plain | | product | text | | | | extended | | createdate | date | | not null | | plain | | Partition key: RANGE (createdate) Partitions: order_range_201701 FOR VALUES FROM ('2017-01-01') TO ('2017-02-01'), order_range_201702 FOR VALUES FROM ('2017-02-01') TO ('2017-03-01'), order_range_fdw FOR VALUES FROM ('1900-01-01') TO ('2017-01-01') #查询数据 postgres=# select * from order_range where createdate<'2017-01-01'; id | userid | product | createdate ----+--------+----------------------------------+------------ 2 | 300686 | 55956a07742d6aebdef7ebb78c2400d7 | 2016-12-22 (1 row) #查看执行计划 postgres=# explain select * from order_range where createdate<'2017-01-01'; QUERY PLAN -------------------------------------------------------------------------------- Append (cost=100.00..131.79 rows=379 width=48) -> Foreign Scan on order_range_fdw (cost=100.00..131.79 rows=379 width=48) (2 rows) #测试看看能不能更新数据 postgres=# insert into order_range (userid,product,createdate) values((random()::numeric(7,6)*1000000)::integer,md5(random()::text),('2017-01-01'::date- (random()*31)::integer)); ERROR: cannot route inserted tuples to a foreign table postgres=# update order_range set createdate='2016-12-01' where createdate='2016-12-22'; UPDATE 1 postgres=# select * from order_range where createdate<'2017-01-01'; id | userid | product | createdate ----+--------+----------------------------------+------------ 2 | 300686 | 55956a07742d6aebdef7ebb78c2400d7 | 2016-12-01 (1 row) postgres=# delete from order_range where createdate='2016-12-01'; DELETE 1 postgres=# select * from order_range where createdate<'2017-01-01'; id | userid | product | createdate ----+--------+---------+------------ (0 rows) postgres=#
插入数据时竟然不能路由到外部表,这个是处于什么考虑呢???,源码中只是提示 /* We do not yet have a way to insert into a foreign partition */
还没有办法这样做,猜猜后面的版本应该能实现
下面再说说使用外部表作为分区表还有一些问题
1、无法约束向分区表插入约束外的数据,如下所示
postgres=# \d+ order_range Table "public.order_range" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description ------------+---------+-----------+----------+-----------------------------------------+----------+--------------+------------- id | bigint | | not null | nextval('order_range_id_seq'::regclass) | plain | | userid | integer | | | | plain | | product | text | | | | extended | | createdate | date | | not null | | plain | | Partition key: RANGE (createdate) Partitions: order_range_201701 FOR VALUES FROM ('2017-01-01') TO ('2017-02-01'), order_range_201702 FOR VALUES FROM ('2017-02-01') TO ('2017-03-01'), order_range_fdw FOR VALUES FROM ('1900-01-01') TO ('2017-01-01') postgres=# postgres=# insert into order_range_fdw (id,userid,product,createdate) values(1, (random()::numeric(7,6)*1000000)::integer,md5(random()::text),('2017-01-01')); INSERT 0 1 postgres=# select * from order_range; id | userid | product | createdate ----+--------+----------------------------------+------------ 1 | 163357 | 7e8fbe7b632a54ba1ec401d969f3259a | 2017-01-17 2 | 349759 | 8095c9036295d3c800dace9069f9c102 | 2017-01-27 1 | 621895 | 5546c6e2a7006b52b5c2df55e19b3759 | 2017-02-01 4 | 313019 | 445316004208e09fb4e7eda2bf5b0865 | 2017-01-01 1 | 505836 | 6e9232c4863c82a2e97b9157996572ea | 2017-01-01 (5 rows) postgres=# select * from order_range where createdate ='2017-01-01'; id | userid | product | createdate ----+--------+---------+------------ (0 rows)
如果这样操作会导致数据查询出现不匹配。
2、sql执行时无法下推
Sql执行无法下推的话对于聚集函数的执行存在很大的性能问题,使用时一定要特别的注意,如下所示
postgres=# delete from order_range_fdw; DELETE 1 postgres=# insert into order_range_fdw (id,userid,product,createdate) values(1, (random()::numeric(7,6)*1000000)::integer,md5(random()::text),('2016-01-01')); INSERT 0 1 postgres=# insert into order_range_fdw (id,userid,product,createdate) values(1, (random()::numeric(7,6)*1000000)::integer,md5(random()::text),('2016-02-01')); INSERT 0 1 #访问order_range,基执行是 postgres=# explain analyze select count(1) from order_range where createdate<'2017-01-01'; QUERY PLAN ------------------------------------------------------------------------------------------ Aggregate (cost=178.27..178.28 rows=1 width=8) (actual time=0.656..0.656 rows=1 loops=1) -> Append (cost=100.00..175.42 rows=1138 width=0) (actual time=0.647..0.649 rows=2 loops=1) -> Foreign Scan on order_range_fdw (cost=100.00..175.42 rows=1138 width=0) (actual time=0.647..0.648 rows=2 loops=1) Planning time: 0.267 ms Execution time: 1.122 ms (5 rows) #直接访问外部表 postgres=# explain analyze select count(1) from order_range_fdw where createdate<'2017-01-01'; QUERY PLAN ------------------------------------------------------------------------------------------- Foreign Scan (cost=102.84..155.54 rows=1 width=8) (actual time=0.661..0.662 rows=1 loops=1) Relations: Aggregate on (public.order_range_fdw) Planning time: 0.154 ms Execution time: 1.051 ms (4 rows)
3、sql查询需要访问的分区表中包含了“外部分区表”和“非外部分区表”时, 无法使用Parallel Seq Scan,如下所示
#插入100W数据到分区表中 postgres=# insert into order_range (userid,product,createdate) SELECT (random()::numeric(7,6)*1000000)::integer,md5(random()::text),('2017-01-01'::date+ (random()*58)::integer) from generate_series(1,1000000); INSERT 0 1000000 #访问所有的分区表 postgres=# explain select count(1) from order_range; QUERY PLAN --------------------------------------------------------------------------------------- Aggregate (cost=24325.22..24325.23 rows=1 width=8) -> Append (cost=0.00..21558.23 rows=1106797 width=0) -> Seq Scan on order_range_201701 (cost=0.00..11231.82 rows=580582 width=0) -> Seq Scan on order_range_201702 (cost=0.00..10114.02 rows=522802 width=0) -> Foreign Scan on order_range_fdw (cost=100.00..212.39 rows=3413 width=0) (5 rows) #只访问“非外部分区表” postgres=# explain select count(1) from order_range where createdate>='2017-01-01'; QUERY PLAN ------------------------------------------------------------------------------------- Finalize Aggregate (cost=17169.84..17169.85 rows=1 width=8) -> Gather (cost=17169.62..17169.83 rows=2 width=8) Workers Planned: 2 -> Partial Aggregate (cost=16169.62..16169.63 rows=1 width=8) -> Append (cost=0.00..15803.52 rows=146440 width=0) -> Parallel Seq Scan on order_range_201701 (cost=0.00..8449.86 rows=80636 width=0) Filter: (createdate >= '2017-01-01'::date) -> Parallel Seq Scan on order_range_201702 (cost=0.00..7353.66 rows=65804 width=0) Filter: (createdate >= '2017-01-01'::date) (9 rows) postgres=#
外部分区表的应用场景
将业务库上的不再修改的冷数全部分离到另一个节点上面,然后做为外部分区表挂上来。这样可以保持业务库的容量尽可以的轻,同时也不会对业务有侵入,这一点是非常的友好。但要注意Sql执行无法下推的问题,无法使用Parallel Seq Scan问题。
如果在后面版本中能解决fdw partition insert路由问题和sql语句执行下推问题那么就可以拿来做olap应用了。
四、建立测试业务表
下面模似一个用户收支流水表
--非分区表
create table t_pay_all (id serial not null primary key,userid integer not null,pay_money float8 not null,createdate date not null); create index t_pay_all_userid_idx on t_pay_all using btree(userid); create index t_pay_all_createdate_idx on t_pay_all using btree(createdate);
--分区表
生成12个分区,一个月份一个表
create table t_pay (id serial not null,userid integer not null,pay_money float8 not null,createdate date not null) partition by range (createdate); create table t_pay_201701 partition of t_pay(id primary key,userid,pay_money,createdate) for values from ('2017-01-01') to ('2017-02-01'); create index t_pay_201701_createdate_idx on t_pay_201701 using btree(createdate); create index t_pay_201701_userid_idx on t_pay_201701 using btree(userid); create table t_pay_201702 partition of t_pay(id primary key,userid,pay_money,createdate) for values from ('2017-02-01') to ('2017-03-01'); create index t_pay_201702_createdate_idx on t_pay_201702 using btree(createdate); create index t_pay_201702_userid_idx on t_pay_201702 using btree(userid); create table t_pay_201703 partition of t_pay(id primary key,userid,pay_money,createdate) for values from ('2017-03-01') to ('2017-04-01'); create index t_pay_201703_createdate_idx on t_pay_201703 using btree(createdate); create index t_pay_201703_userid_idx on t_pay_201703 using btree(userid); create table t_pay_201704 partition of t_pay(id primary key,userid,pay_money,createdate) for values from ('2017-04-01') to ('2017-05-01'); create index t_pay_201704_createdate_idx on t_pay_201704 using btree(createdate); create index t_pay_201704_userid_idx on t_pay_201704 using btree(userid); create table t_pay_201705 partition of t_pay(id primary key,userid,pay_money,createdate) for values from ('2017-05-01') to ('2017-06-01'); create index t_pay_201705_createdate_idx on t_pay_201705 using btree(createdate); create index t_pay_201705_userid_idx on t_pay_201705 using btree(userid); create table t_pay_201706 partition of t_pay(id primary key,userid,pay_money,createdate) for values from ('2017-06-01') to ('2017-07-01'); create index t_pay_201706_createdate_idx on t_pay_201706 using btree(createdate); create index t_pay_201706_userid_idx on t_pay_201706 using btree(userid); create table t_pay_201707 partition of t_pay(id primary key,userid,pay_money,createdate) for values from ('2017-07-01') to ('2017-08-01'); create index t_pay_201707_createdate_idx on t_pay_201707 using btree(createdate); create index t_pay_201707_userid_idx on t_pay_201707 using btree(userid); create table t_pay_201708 partition of t_pay(id primary key,userid,pay_money,createdate) for values from ('2017-08-01') to ('2017-09-01'); create index t_pay_201708_createdate_idx on t_pay_201708 using btree(createdate); create index t_pay_201708_userid_idx on t_pay_201708 using btree(userid); create table t_pay_201709 partition of t_pay(id primary key,userid,pay_money,createdate) for values from ('2017-09-01') to ('2017-10-01'); create index t_pay_201709_createdate_idx on t_pay_201709 using btree(createdate); create index t_pay_201709_userid_idx on t_pay_201709 using btree(userid); create table t_pay_201710 partition of t_pay(id primary key,userid,pay_money,createdate) for values from ('2017-10-01') to ('2017-11-01'); create index t_pay_201710_createdate_idx on t_pay_201710 using btree(createdate); create index t_pay_201710_userid_idx on t_pay_201710 using btree(userid); create table t_pay_201711 partition of t_pay(id primary key,userid,pay_money,createdate) for values from ('2017-11-01') to ('2017-12-01'); create index t_pay_201711_createdate_idx on t_pay_201711 using btree(createdate); create index t_pay_201711_userid_idx on t_pay_201711 using btree(userid); create table t_pay_201712 partition of t_pay(id primary key,userid,pay_money,createdate) for values from ('2017-12-01') to ('2018-01-01'); create index t_pay_201712_createdate_idx on t_pay_201712 using btree(createdate); create index t_pay_201712_userid_idx on t_pay_201712 using btree(userid);
五、性能测试
数据导入
--生成测试数据1000W条记录(尽可能平均分布)
postgres=# copy (select (random()::numeric(7,6)*1000000)::integer as userid,round((random()*100)::numeric,2) as pay_money,('2017-01-01'::date+ (random()*364)::integer) as createtime from generate_series(1,10000000)) to '/home/pg/data.txt'; COPY 10000000 Time: 42674.548 ms (00:42.675)
--非分区表数据导入测试
postgres=# copy t_pay_all(userid,pay_money,createdate) from '/home/pg/data.txt'; COPY 10000000 Time: 114258.743 ms (01:54.259)
--分区表数据导入测试
postgres=# copy t_pay(userid,pay_money,createdate) from '/home/pg/data.txt'; COPY 10000000 Time: 186358.447 ms (03:06.358) postgres=#
结论:数据导入时性能相差大约是一半,所以大数据量导入时最好直接导成分区表数据,然后直接对分区表进行操作
查询某一天的数据--直接从cache里取数据
--非分区表
postgres=# explain (analyze,buffers) select * from t_pay_all where createdate ='2017-06-01'; QUERY PLAN ------------------------------------------------------------------------------------------- Bitmap Heap Scan on t_pay_all (cost=592.06..50797.88 rows=27307 width=20) (actual time=14.544..49.039 rows=27384 loops=1) Recheck Cond: (createdate = '2017-06-01'::date) Heap Blocks: exact=22197 Buffers: shared hit=22289 -> Bitmap Index Scan on t_pay_all_createdate_idx (cost=0.00..585.24 rows=27307 width=0) (actual time=7.121..7.121 rows=27384 loops=1) Index Cond: (createdate = '2017-06-01'::date) Buffers: shared hit=92 Planning time: 0.153 ms Execution time: 51.583 ms (9 rows) Time: 52.272 ms
--分区表
postgres=# explain (analyze,buffers) select * from t_pay where createdate ='2017-06-01'; QUERY PLAN ---------------------------------------------------------------------------------------------- Append (cost=608.92..6212.11 rows=27935 width=20) (actual time=4.880..27.032 rows=27384 loops=1) Buffers: shared hit=5323 -> Bitmap Heap Scan on t_pay_201706 (cost=608.92..6212.11 rows=27935 width=20) (actual time=4.879..21.990 rows=27384 loops=1) Recheck Cond: (createdate = '2017-06-01'::date) Heap Blocks: exact=5226 Buffers: shared hit=5323 -> Bitmap Index Scan on t_pay_201706_createdate_idx (cost=0.00..601.94 rows=27935 width=0) (actual time=3.399..3.399 rows=27384 loops=1) Index Cond: (createdate = '2017-06-01'::date) Buffers: shared hit=97 Planning time: 0.521 ms Execution time: 30.061 ms (11 rows)
结论:分区表的Planning time时间明显比非分区表要高,但比起Execution time基本可以忽略。
查询某个时间范围的数据
1、时间范围落在同一个分区内
--非分区表
postgres=# explain (analyze,buffers)select * from t_pay_all where createdate >='2017-06-01' AND createdate<'2017-07-01'; QUERY PLAN ------------------------------------------------------------------------------------------ Bitmap Heap Scan on t_pay_all (cost=19802.01..95862.00 rows=819666 width=20) (actual time=115.210..459.547 rows=824865 loops=1) Recheck Cond: ((createdate >= '2017-06-01'::date) AND (createdate < '2017-07-01'::date)) Heap Blocks: exact=63701 Buffers: shared read=66578 -> Bitmap Index Scan on t_pay_all_createdate_idx (cost=0.00..19597.10 rows=819666 width=0) (actual time=101.453..101.453 rows=825865 loops=1) Index Cond: ((createdate >= '2017-06-01'::date) AND (createdate < '2017-07-01'::date)) Buffers: shared read=2877 Planning time: 0.166 ms Execution time: 504.297 ms (9 rows) Time: 505.021 ms postgres=# explain (analyze,buffers)select count(1) from t_pay_all where createdate >='2017-06-01' AND createdate<'2017-07-01'; QUERY PLAN ---------------------------------------------------------------------------------------------- Finalize Aggregate (cost=90543.96..90543.97 rows=1 width=8) (actual time=335.334..335.335 rows=1 loops=1) Buffers: shared hit=351 read=66593 -> Gather (cost=90543.74..90543.95 rows=2 width=8) (actual time=334.988..335.327 rows=3 loops=1) Workers Planned: 2 Workers Launched: 2 Buffers: shared hit=351 read=66593 -> Partial Aggregate (cost=89543.74..89543.75 rows=1 width=8) (actual time=330.796..330.797 rows=1 loops=3) Buffers: shared read=66578 -> Parallel Bitmap Heap Scan on t_pay_all (cost=19802.01..88689.92 rows=341528 width=0) (actual time=124.126..303.125 rows=274955 loops=3) Recheck Cond: ((createdate >= '2017-06-01'::date) AND (createdate < '2017-07www.devze.com-01'::date)) Heap Blocks: exact=25882 Buffers: shared read=66578 -> Bitmap Index Scan on t_pay_all_createdate_idx (cost=0.00..19597.10 rows=819666 width=0) (actual time=111.233..111.233 rows=825865 loops=1) Index Cond: ((createdate >= '2017-06-01'::date) AND (createdate < '2017-07-01'::date)) Buffers: shared read=2877 Planning time: 0.213 ms Execution time: 344.013 ms (17 rows) Time: 344.759 ms postgres=#
--分区表
postgres=# explain (analyze,buffers)select * from t_pay where createdate >='2017-06-01' AND createdate<'2017-07-01'; QUERY PLAN ------------------------------------------------------------------------------------------- Append (cost=0.00..17633.97 rows=824865 width=20) (actual time=0.020..272.926 rows=824865 loops=1) Buffers: shared hit=5261 -> Seq Scan on t_pay_201706 (cost=0.00..17633.97 rows=824865 width=20) (actual time=0.019..170.128 rows=824865 loops=1) Filter: ((createdate >开发者_大数据= '2017-06-01'::date) AND (createdate < '2017-07-01'::date)) Buffers: shared hit=5261 Planning time: 0.779 ms Execution time: 335.351 ms (7 rows) Time: 336.676 ms postgres=# explain (analyze,buffers)select count(1) from t_pay where createdate >='2017-06-01' AND createdate<'2017-07-01'; QUERY PLAN -------------------------------------------------------------------------------------------- Finalize Aggregate (cost=12275.86..12275.87 rows=1 width=8) (actual time=144.023..144.023 rows=1 loops=1) Buffers: shared hit=5429 -> Gather (cost=12275.64..12275.85 rows=2 width=8) (actual time=143.966..144.016 rows=3 loops=1) Workers Planned: 2 Workers Launched: 2 Buffers: shared hit=5429 -> Partial Aggregate (cost=11275.64..11275.65 rows=1 width=8) (actual time=140.230..140.230 rows=1 loops=3) Buffers: shared hit=5261 -> Append (cost=0.00..10416.41 rows=343694 width=0) (actual time=0.022..106.973 rows=274955 loops=3) Buffers: shared hit=5261 -> Parallel Seq Scan on t_pay_201706 (cost=0.00..10416.41 rows=343694 width=0) (actual time=0.020..68.952 rows=274955 loops=3) Filter: ((createdate >= '2017-06-01'::date) AND (createdate < '2017-07-01'::date)) Buffers: shared hit=5261 Planning time: 0.760 ms Execution time: 145.289 ms (15 rows) Time: 146.610 ms
在同一个分区内查询优势明显
2、不在同一个分区内
--非分区表
postgres=# explain (analyze,buffers)select count(1) from t_pay_all where createdate >='2017-06-01' AND createdate<'2017-12-01'; QUERY PLAN ------------------------------------------------------------------------------------------- Finalize Aggregate (cost=132593.42..132593.43 rows=1 width=8) (actual time=717.848..717.848 rows=1 loops=1) Buffers: shared hit=33571 read=30446 dirtied=9508 written=4485 -> Gather (cost=132593.20..132593.41 rows=2 width=8) (actual time=717.782..717.841 rows=3 loops=1) Workers Planned: 2 Workers Launched: 2 Buffers: shared hit=33571 read=30446 dirtied=9508 written=4485 -> Partial Aggregate (cost=131593.20..131593.21 rows=1 width=8) (actual time=714.096..714.097 rows=1 loops=3) Buffers: shared hit=33319 read=30446 dirtied=9508 written=4485 -> Parallel Seq Scan on t_pay_all (cost=0.00..126330.64 rows=2105024 width=0) (actual time=0.059..545.016 rows=1675464 loops=3) Filter: ((createdate >= '2017-06-01'::date) AND (createdate < '2017-12-01'::date)) Rows Removed by Filter: 1661203 Buffers: shared hit=33319 read=30446 dirtied=9508 written=4485 Planning time: 0.178 ms Execution time: 721.822 ms (14 rows) Time: 722.521 ms
--分区表
postgres=# explain (analyze,buffers)select count(1) from t_pay where createdate >='2017-06-01' AND createdate<'2017-12-01'; QUERY PLAN ------------------------------------------------------------------------------------------ Finalize Aggregate (cost=69675.98..69675.99 rows=1 width=8) (actual time=714.560..714.560 rows=1 loops=1) Buffers: shared hit=27002 read=5251 -> Gather (cost=69675.77..69675.98 rows=2 width=8) (actual time=714.426..714.551 rows=3 loops=1) Workers Planned: 2 Workers Launched: 2 Buffers: shared hit=27002 read=5251 -> Partial Aggregate (cost=68675.77..68675.78 rows=1 width=8) (actual time=710.416..710.416 rows=1 loops=3) Buffers: shared hit=26774 read=5251 -> Append (cost=0.00..63439.94 rows=2094330 width=0) (actual time=0.023..536.033 rows=1675464 loops=3) Buffers: shared hit=26774 read=5251 -> Parallel Seq Scan on t_pay_201706 (cost=0.00..10416.41 rows=343694 width=0) (actual time=0.021..67.935 rows=274955 loops=3) Filter: ((createdate >= '2017-06-01'::date) AND (createdate < '2017-12-01'::date)) Buffers: shared hit=5261 -> Parallel Seq Scan on t_pay_201707 (cost=0.00..10728.06 rows=354204 width=0) (actual time=0.007..54.999 rows=283363 loops=3) Filter: ((createdate >= '2017-06-01'::date) AND (createdate < '2017-12-01'::date)) Buffers: shared hit=5415 -> Parallel Seq Scan on t_pay_201708 (cost=0.00..10744.08 rows=354738 width=0) (actual time=0.007..55.117 rows=283791 loops=3) Filter: ((createdate >= '2017-06-01'::date) AND (createdate < '2017-12-01'::date)) Buffers: shared hit=5423 -> Parallel Seq Scan on t_pay_201709 (cost=0.00..10410.71 rows=343714 width=0) (actual time=0.007..53.402 rows=274971 loops=3) Filter: ((createdate >= '2017-06-01'::date) AND (createdate < '2017-12-01'::date)) Buffers: shared hit=5255 -> Parallel Seq Scan on t_pay_201710 (cost=0.00..10737.41 rows=354494 width=0) (actual time=0.007..55.475 rows=283595 loops=3) Filter: ((createdate >= '2017-06-01'::date) AND (createdate < '2017-12-01'::date)) Buffers: shared hit=5420 -> Parallel Seq Scan on t_pay_201711 (cost=0.00..10403.29 rows=343486 width=0) (actual time=0.036..57.635 rows=274789 loops=3) Filter: ((createdate >= '2017-06-01'::date) AND (createdate < '2017-12-01'::date)) Buffers: shared read=5251 Planning time: 1.217 ms Execution time: 718.372 ms (30 rows)
跨分区查询,大约在跨一半分区时性能相当。
查询某个月里某个用户数据--直接从cache里取数据
1、数据都落在所在分区,并且数据量极少
--非分区表
postgres=# explain (analyze,buffers) select * from t_pay_all where createdate>='2017-06-01' AND createdate<'2017-07-01' and userid=268460; QUERY PLAN -------------------------------------------------------------------------------------------- Index Scan using t_pay_all_userid_idx on t_pay_all (cost=0.43..48.68 rows=1 width=20) (actual time=0.053..0.071 rows=7 loops=1) Index Cond: (userid = 268460) Filter: ((createdate >= '2017-06-01'::date) AND (createdatehttp://www.devze.com < '2017-07-01'::date)) Rows Removed by Filter: 10 Buffers: shared hit=20 Planning time: 0.149 ms Execution time: 0.101 ms (7 rows) Time: 0.676 ms
--分区表
postgres=# explain (analyze,buffers) select * from t_pay where createdate >='2017-06-01' AND createdate<'2017-07-01' and userid=268460; QUERY PLAN ------------------------------------------------------------------------------------------ Append (cost=0.42..12.47 rows=2 width=20) (actual time=0.019..0.032 rows=7 loops=1) Buffers: shared hit=10 -> Index Scan using t_pay_201706_userid_idx on t_pay_201706 (cost=0.42..12.47 rows=2 width=20) (actual time=0.018..0.029 rows=7 loops=1) Index Cond: (userid = 268460) Filter: ((createdate >= '2017-06-01'::date) AND (createdate < '2017-07-01'::date)) Buffers: shared hit=10 Planning time: 0.728 ms Execution time: 0.064 ms (8 rows) Time: 1.279 ms
在返回记录极少的情况下由于分布表的Planning time开销较大,所以非分区表有优势
2、数据落在其它分区,并且数据量比较大
--非分区表
postgres=# explain (analyze,buffers) select * from t_pay_all where createdate >='2017-06-01' AND createdate<'2017-07-01' and userid=302283 ; QUERY PLAN --------------------------------------------------------------------------------------------- Bitmap Heap Scan on t_pay_all (cost=19780.69..22301.97 rows=683 width=20) (actual time=91.778..91.803 rows=2 loops=1) Recheck Cond: ((userid = 302283) AND (createdate >= '2017-06-01'::date) AND (createdate < '2017-07-01'::date)) Heap Blocks: exact=9 Buffers: shared hit=2927 -> BitmapAnd (cost=19780.69..19780.编程客栈69 rows=683 width=0) (actual time=91.767..91.767 rows=0 loops=1) Buffers: shared hit=2918 -> Bitmap Index Scan on t_pay_all_userid_idx (cost=0.00..183.00 rows=8342 width=0) (actual time=0.916..0.916 rows=11013 loops=1) Index Cond: (userid = 302283) Buffers: shared hit=41 -> Bitmap Index Scan on t_pay_all_createdate_idx (cost=0.00..19597.10 rows=819666 width=0) (actual time=90.837..90.837 rows=825865 loops=1) Index Cond: ((createdate >= '2017-06-01'::date) AND (createdate < '2017-07-01'::date)) Buffers: shared hit=2877 Planning time: 0.172 ms Execution time: 91.851 ms (14 rows) Time: 92.534 ms
--分区表
postgres=# explain (analyze,buffers) select * from t_pay where createdate >='2017-06-01' AND createdate<'2017-07-01' and userid=302283 ; QUERY PLAN ------------------------------------------------------------------------------------------- Append (cost=0.42..12.47 rows=2 width=20) (actual time=0.042..0.046 rows=2 loops=1) Buffers: shared hit=7 -> Index Scan using t_pay_201706_userid_idx on t_pay_201706 (cost=0.42..12.47 rows=2 width=20) (actual time=0.041..0.045 rows=2 loops=1) Index Cond: (userid = 302283) Filter: ((createdate >= '2017-06-01'::date) AND (createdate < '2017-07-01'::date)) Buffers: shared hit=7 Planning time: 0.818 ms Execution time: 0.096 ms (8 rows) Time: 1.499 ms
这是分区表最大的优势体现了,性能提升不是一般的大
索引维护
--非分区表
postgres=# REINDEX INDEX t_pay_all_createdate_idx; REINDEX Time: 11827.344 ms (00:11.827)
--分区表
postgres=# REINDEX INDEX t_pay_201706_createdate_idx; REINDEX Time: 930.439 ms postgres=#
这个也是分区表的优势,可以针对某个分区的索引进行重建。
删除整个分区数据
--非分区表
postgres=# delete from t_pay_all where createdate >='2017-06-01' and createdate<'2017-07-01'; DELETE 824865 Time: 5775.545 ms (00:05.776)
--分区表
postgres=# truncate table t_pay_201706; TRUNCATE TABLE Time: 177.809 ms
个也是分区表的优势,可以对某个分区直接truncate
全表扫描
--非分区表
postgres=# explain analyze select count(1) from t_pay; QUERY PLAN --------------------------------------------------------------------------------------------- Finalize Aggregate (cost=107370.96..107370.97 rows=1 width=8) (actual time=971.561..971.561 rows=1 loops=1) -> Gather (cost=107370.75..107370.96 rows=2 width=8) (actual time=971.469..971.555 rows=3 loops=1) Workers Planned: 2 Workers Launched: 2 -> Partial Aggregate (cost=106370.75..106370.76 rows=1 width=8) (actual time=967.378..967.378 rows=1 loops=3) -> Append (cost=0.00..96800.40 rows=3828141 width=0) (actual time=0.019..698.882 rows=3061712 loops=3) -> Parallel Seq Scan on t_pay_201701 (cost=0.00..8836.14 rows=349414 width=0) (actual time=0.017..48.716 rows=279531 loops=3) -> Parallel Seq Scan on t_pay_201702 (cost=0.00..8119.94 rows=321094 width=0) (actual time=0.007..33.072 rows=256875 loops=3) -> Parallel Seq Scan on t_pay_201703 (cost=0.00..9079.47 rows=359047 width=0) (actual time=0.006..37.153 rows=287238 loops=3) -> Parallel Seq Scan on t_pay_201704 (cost=0.00..8672.67 rows=342968 width=0) (actual time=0.006..35.317 rows=274374 loops=3) -> Parallel Seq Scan on t_pay_201705 (cost=0.00..8975.23 rows=354923 width=0) (actual time=0.006..36.571 rows=283938 loops=3) -> Parallel Seq Scan on t_pay_201706 (cost=0.00..20.00 rows=1000 width=0) (actual time=0.000..0.000 rows=0 loops=3) -> Parallel Seq Scan on t_pay_201707 (cost=0.00..8957.04 rows=354204 width=0) (actual time=0.006..36.393 rows=283363 loops=3) -> Parallel Seq Scan on t_pay_201708 (cost=0.00..8970.38 rows=354738 width=0) (actual time=0.006..37.015 rows=283791 loops=3) -> Parallel Seq Scan on t_pay_201709 (cost=0.00..8692.14 rows=343714 width=0) (actual time=0.006..35.187 rows=274971 loops=3) -> Parallel Seq Scan on t_pay_201710 (cost=0.00..8964.94 rows=354494 width=0) (actual time=0.006..36.566 rows=283595 loops=3) -> Parallel Seq Scan on t_pay_201711 (cost=0.00..8685.86 rows=343486 width=0) (actual time=0.006..35.198 rows=274789 loops=3) -> Parallel Seq Scan on t_pay_201712 (cost=0.00..8826.59 rows=349059 width=0) (actual time=0.006..36.523 rows=279247 loops=3) Planning time: 0.706 ms Execution time: 977.364 ms (20 rows) Time: 978.705 ms postgres=#
--分区表
postgres=# explain analyze select count(1) from t_pay_all; QUERY PLAN ------------------------------------------------------------------------------------------------- Finalize Aggregate (cost=116900.63..116900.64 rows=1 width=8) (actual time=644.093..644.093 rows=1 loops=1) -> Gather (cost=116900.42..116900.63 rows=2 width=8) (actual time=644.035..644.087 rows=3 loops=1) Workers Planned: 2 Workers Launched: 2 -> Partial Aggregate (cost=115900.42..115900.43 rows=1 width=8) (actual time=640.587..640.587 rows=1 loops=3) -> Parallel Seq Scan on t_pay_all (cost=0.00..105473.33 rows=4170833 width=0) (actual time=0.344..371.965 rows=3061712 loops=3) Planning time: 0.164 ms Execution time: 645.438 ms (8 rows) Time: 646.027 ms
全扫描时分区表落后,但还基本上能接收。
增加新的分区并导入数据
--生成新的分区数据
copy (select userid,pay_money,createdate+31 as createdate from t_pay_201712) to '/home/pg/201801.txt';
--建立新的分区
create table t_pay_201801 partition of t_pay(id primary key,userid,pay_money,createdate) for values from ('2018-01-01') to ('2018-02-01'); create index t_pay_201801_createdate_idx on t_pay_201801 using btree(createdate); create index t_pay_201801_userid_idx on t_pay_201801 using btree(userid);
--非分区表
postgres=# copy t_pay_all(userid,pay_money,createdate) from '/home/pg/201801.txt'; COPY 837741 Time: 18105.024 ms (00:18.105)
--分区表
postgres=# copy t_pay(userid,pay_money,createdate) from '/home/pg/201801.txt'; COPY 837741 Time: 13864.950 ms (00:13.865) postgres=#
新的分区数据导入保持优势
到此这篇关于PostgreSQL 10分区表详解及性能测试报告的文章就介绍到这了,更多相关PostgreSQL 10分区表性能测试内容请搜索我们以前的文章或继续浏览下面的相关文章希望大家以后多多支持我们!
精彩评论