首页
解决方案
数据库专业技术服务全栈式PostgreSQL解决方案Oracle分布式存储化数据库云PolarDB一体化解决方案
产品
CLup:PostgreSQL高可用集群平台 CMiner: PostgreSQL中的CDC CData高性能数据库云一体机 CBackup数据库备份恢复云平台 CPDA高性能双子星数据库机 CSYun超融合虚拟机产品 ZQPool数据库连接池 ConshGuard数据保护产品 APCC: Greenplum管理平台
文档
文章
客户及伙伴
中启开源
关于我们
公司简介 联系我们
中启开源

1. 疑惑

我们知道控制分区裁剪的参数有两个:

这两个参数有什么区别呢?

2. 解答

要说明这两个参数的区别需要先讲一讲PostgreSQL数据库中分区的历史,在PostgreSQL 10版本之前,PostgreSQL数据库实际上是没有单独的创建分区表的DDL语句,都是通过表继承的原理来创建分区表,这样使得在PostgreSQL中使用分区表不是很方便,到PostgreSQL 10之后,PostgreSQL扩展了创建表的DDL语句,可以用这个DDL语句来创建分区表,原先使用继承的方式还是可以创建分区表,但这两种分区表是不能混用的。于是PostgreSQL 10增加的分区表叫声明式分区(Declarative Partitioning),原先使用表继承的方式仍然可以实现分区表的功能。

而使用继承的方式实现的分区表的分区裁剪是靠设置参数“constraint_exclusion=partition”来实现的,而如果使用了声明式分区表,则需要使用参数“enable_partition_pruning”来控制是否使用分区裁剪功能。

3. 测试

创建声明式分区表:

  1. CREATE TABLE ptab01 (
  2. id int not null,
  3. tm timestamptz not null
  4. ) PARTITION BY RANGE (tm);
  5. create table ptab01_202001 partition of ptab01 for values from ('2020-01-01') to ('2020-02-01');
  6. create table ptab01_202002 partition of ptab01 for values from ('2020-02-01') to ('2020-03-01');
  7. create table ptab01_202003 partition of ptab01 for values from ('2020-03-01') to ('2020-04-01');
  8. create table ptab01_202004 partition of ptab01 for values from ('2020-04-01') to ('2020-05-01');
  9. create table ptab01_202005 partition of ptab01 for values from ('2020-05-01') to ('2020-06-01');
  10. insert into ptab01 select extract(epoch from seq), seq from generate_series('2020-01-01'::timestamptz, '2020-05-31 23:59:59'::timestamptz, interval '10 seconds') as seq;

创建传统的继承式的分区表:

  1. CREATE TABLE ptab02 (
  2. id int not null,
  3. tm timestamptz not null
  4. );
  5. CREATE TABLE ptab02_202001 (
  6. CHECK ( tm >= '2020-01-01'::timestamptz AND tm < '2020-02-01'::timestamptz )
  7. ) INHERITS (ptab02);
  8. CREATE TABLE ptab02_202002 (
  9. CHECK ( tm >= '2020-02-01'::timestamptz AND tm < '2020-03-01'::timestamptz )
  10. ) INHERITS (ptab02);
  11. CREATE TABLE ptab02_202003 (
  12. CHECK ( tm >= '2020-03-01'::timestamptz AND tm < '2020-04-01'::timestamptz )
  13. ) INHERITS (ptab02);
  14. CREATE TABLE ptab02_202004 (
  15. CHECK ( tm >= '2020-04-01'::timestamptz AND tm < '2020-05-01'::timestamptz )
  16. ) INHERITS (ptab02);
  17. CREATE TABLE ptab02_202005 (
  18. CHECK ( tm >= '2020-05-01'::timestamptz AND tm < '2020-06-01'::timestamptz )
  19. ) INHERITS (ptab02);
  20. CREATE OR REPLACE FUNCTION ptab02_insert_trigger()
  21. RETURNS TRIGGER AS $$
  22. BEGIN
  23. IF NEW.tm >= '2020-01-01'::timestamptz AND NEW.tm < '2020-02-01'::timestamptz THEN
  24. INSERT INTO ptab02_202001 VALUES (NEW.*);
  25. ELSIF NEW.tm >= '2020-02-01'::timestamptz AND NEW.tm < '2020-03-01'::timestamptz THEN
  26. INSERT INTO ptab02_202002 VALUES (NEW.*);
  27. ELSIF NEW.tm >= '2020-02-01'::timestamptz AND NEW.tm < '2020-04-01'::timestamptz THEN
  28. INSERT INTO ptab02_202003 VALUES (NEW.*);
  29. ELSIF NEW.tm >= '2020-02-01'::timestamptz AND NEW.tm < '2020-05-01'::timestamptz THEN
  30. INSERT INTO ptab02_202004 VALUES (NEW.*);
  31. ELSIF NEW.tm >= '2020-02-01'::timestamptz AND NEW.tm < '2020-06-01'::timestamptz THEN
  32. INSERT INTO ptab02_202005 VALUES (NEW.*);
  33. ELSE
  34. RAISE 'value % out of range ', NEW.tm;
  35. END IF;
  36. RETURN NULL;
  37. END;
  38. $$
  39. LANGUAGE plpgsql;
  40. CREATE TRIGGER insert_ptab02_trigger
  41. BEFORE INSERT ON ptab02
  42. FOR EACH ROW EXECUTE FUNCTION ptab02_insert_trigger();
  43. insert into ptab02 select extract(epoch from seq), seq from generate_series('2020-01-01'::timestamptz, '2020-05-31 23:59:59'::timestamptz, interval '10 seconds') as seq;

默认情况下constraint_exclusion为partition和enable_partition_pruning为on,不管是表继承方式实现的分区表还是声明式分区表都可以走到分区裁剪,如下所示:

  1. postgres=# show constraint_exclusion;
  2. constraint_exclusion
  3. ----------------------
  4. partition
  5. (1 row)
  6. postgres=# show enable_partition_pruning;
  7. enable_partition_pruning
  8. --------------------------
  9. on
  10. (1 row)
  11. postgres=# explain select * from ptab01 where tm='2020-01-07'::timestamptz;
  12. QUERY PLAN
  13. ---------------------------------------------------------------------------------------
  14. Gather (cost=1000.00..4417.51 rows=1 width=12)
  15. Workers Planned: 1
  16. -> Parallel Seq Scan on ptab01_202001 ptab01 (cost=0.00..3417.41 rows=1 width=12)
  17. Filter: (tm = '2020-01-07 00:00:00+08'::timestamp with time zone)
  18. (4 rows)
  19. postgres=# explain select * from ptab02 where tm='2020-01-07'::timestamptz;
  20. QUERY PLAN
  21. -----------------------------------------------------------------------------------------------
  22. Gather (cost=1000.00..4417.62 rows=2 width=12)
  23. Workers Planned: 2
  24. -> Parallel Append (cost=0.00..3417.42 rows=2 width=12)
  25. -> Parallel Seq Scan on ptab02_202001 ptab02_2 (cost=0.00..3417.41 rows=1 width=12)
  26. Filter: (tm = '2020-01-07 00:00:00+08'::timestamp with time zone)
  27. -> Parallel Seq Scan on ptab02 ptab02_1 (cost=0.00..0.00 rows=1 width=12)
  28. Filter: (tm = '2020-01-07 00:00:00+08'::timestamp with time zone)
  29. (7 rows)

从上面可以看出,声明式分区表只扫描了包括指定时间实际的分区ptab01_202001,没有扫描其他时间段的分区,继承式的分区表只扫描了父表ptab02和包括指定时间实际的分区ptab02_202001。

当我们把参数enable_partition_pruning设置为off,这是可以看到查询声明式分区表时,会扫描所有分区,没有进行分区裁剪,但是继承式分区表还是进行了分区裁剪:

  1. postgres=# set enable_partition_pruning to off;
  2. SET
  3. postgres=# explain select * from ptab01 where tm='2020-01-07'::timestamptz;
  4. QUERY PLAN
  5. -----------------------------------------------------------------------------------------------
  6. Gather (cost=1000.00..17758.00 rows=5 width=12)
  7. Workers Planned: 2
  8. -> Parallel Append (cost=0.00..16757.50 rows=5 width=12)
  9. -> Parallel Seq Scan on ptab01_202001 ptab01_1 (cost=0.00..3417.41 rows=1 width=12)
  10. Filter: (tm = '2020-01-07 00:00:00+08'::timestamp with time zone)
  11. -> Parallel Seq Scan on ptab01_202003 ptab01_3 (cost=0.00..3417.41 rows=1 width=12)
  12. Filter: (tm = '2020-01-07 00:00:00+08'::timestamp with time zone)
  13. -> Parallel Seq Scan on ptab01_202005 ptab01_5 (cost=0.00..3417.41 rows=1 width=12)
  14. Filter: (tm = '2020-01-07 00:00:00+08'::timestamp with time zone)
  15. -> Parallel Seq Scan on ptab01_202004 ptab01_4 (cost=0.00..3307.88 rows=1 width=12)
  16. Filter: (tm = '2020-01-07 00:00:00+08'::timestamp with time zone)
  17. -> Parallel Seq Scan on ptab01_202002 ptab01_2 (cost=0.00..3197.35 rows=1 width=12)
  18. Filter: (tm = '2020-01-07 00:00:00+08'::timestamp with time zone)
  19. (13 rows)
  20. postgres=# explain select * from ptab02 where tm='2020-01-07'::timestamptz;
  21. QUERY PLAN
  22. -----------------------------------------------------------------------------------------------
  23. Gather (cost=1000.00..4417.62 rows=2 width=12)
  24. Workers Planned: 2
  25. -> Parallel Append (cost=0.00..3417.42 rows=2 width=12)
  26. -> Parallel Seq Scan on ptab02_202001 ptab02_2 (cost=0.00..3417.41 rows=1 width=12)
  27. Filter: (tm = '2020-01-07 00:00:00+08'::timestamp with time zone)
  28. -> Parallel Seq Scan on ptab02 ptab02_1 (cost=0.00..0.00 rows=1 width=12)
  29. Filter: (tm = '2020-01-07 00:00:00+08'::timestamp with time zone)
  30. (7 rows)

当我们把参数constraint_exclusion设置为off,这是可以看到继承式分区表就没有再做分区裁剪了:

  1. postgres=# set constraint_exclusion to off;
  2. SET
  3. postgres=# explain select * from ptab02 where tm='2020-01-07'::timestamptz;
  4. QUERY PLAN
  5. -----------------------------------------------------------------------------------------------
  6. Gather (cost=1000.00..17758.10 rows=6 width=12)
  7. Workers Planned: 2
  8. -> Parallel Append (cost=0.00..16757.50 rows=6 width=12)
  9. -> Parallel Seq Scan on ptab02_202001 ptab02_2 (cost=0.00..3417.41 rows=1 width=12)
  10. Filter: (tm = '2020-01-07 00:00:00+08'::timestamp with time zone)
  11. -> Parallel Seq Scan on ptab02_202003 ptab02_4 (cost=0.00..3417.41 rows=1 width=12)
  12. Filter: (tm = '2020-01-07 00:00:00+08'::timestamp with time zone)
  13. -> Parallel Seq Scan on ptab02_202005 ptab02_6 (cost=0.00..3417.41 rows=1 width=12)
  14. Filter: (tm = '2020-01-07 00:00:00+08'::timestamp with time zone)
  15. -> Parallel Seq Scan on ptab02_202004 ptab02_5 (cost=0.00..3307.88 rows=1 width=12)
  16. Filter: (tm = '2020-01-07 00:00:00+08'::timestamp with time zone)
  17. -> Parallel Seq Scan on ptab02_202002 ptab02_3 (cost=0.00..3197.35 rows=1 width=12)
  18. Filter: (tm = '2020-01-07 00:00:00+08'::timestamp with time zone)
  19. -> Parallel Seq Scan on ptab02 ptab02_1 (cost=0.00..0.00 rows=1 width=12)
  20. Filter: (tm = '2020-01-07 00:00:00+08'::timestamp with time zone)
  21. (15 rows)

由此可见:

4. 区别

通常声明式分区比继承式分区表在更多的情况下能走到分区裁剪,如在select * from tab01 where tm = (select tm from tab02);类似的SQL中:

  1. postgres=# explain analyze select * from ptab01 where tm = (select tm from tabtm);
  2. QUERY PLAN
  3. ------------------------------------------------------------------------------------------------------------------------------------------
  4. Gather (cost=1001.01..17759.01 rows=5 width=12) (actual time=4.154..27.762 rows=1 loops=1)
  5. Workers Planned: 2
  6. Params Evaluated: $0
  7. Workers Launched: 2
  8. InitPlan 1 (returns $0)
  9. -> Seq Scan on tabtm (cost=0.00..1.01 rows=1 width=8) (actual time=0.008..0.009 rows=1 loops=1)
  10. -> Parallel Append (cost=0.00..16757.50 rows=5 width=12) (actual time=1.228..6.338 rows=0 loops=3)
  11. -> Parallel Seq Scan on ptab01_202001 ptab01_1 (cost=0.00..3417.41 rows=1 width=12) (actual time=3.652..18.978 rows=1 loops=1)
  12. Filter: (tm = $0)
  13. Rows Removed by Filter: 267839
  14. -> Parallel Seq Scan on ptab01_202003 ptab01_3 (cost=0.00..3417.41 rows=1 width=12) (never executed)
  15. Filter: (tm = $0)
  16. -> Parallel Seq Scan on ptab01_202005 ptab01_5 (cost=0.00..3417.41 rows=1 width=12) (never executed)
  17. Filter: (tm = $0)
  18. -> Parallel Seq Scan on ptab01_202004 ptab01_4 (cost=0.00..3307.88 rows=1 width=12) (never executed)
  19. Filter: (tm = $0)
  20. -> Parallel Seq Scan on ptab01_202002 ptab01_2 (cost=0.00..3197.35 rows=1 width=12) (never executed)
  21. Filter: (tm = $0)
  22. Planning Time: 0.121 ms
  23. Execution Time: 27.858 ms
  24. (20 rows)
  25. postgres=# explain analyze select * from ptab02 where tm = (select tm from tabtm);
  26. QUERY PLAN
  27. -------------------------------------------------------------------------------------------------------------------------------------------
  28. Gather (cost=1001.01..17759.11 rows=6 width=12) (actual time=107.141..107.310 rows=1 loops=1)
  29. Workers Planned: 2
  30. Params Evaluated: $0
  31. Workers Launched: 2
  32. InitPlan 1 (returns $0)
  33. -> Seq Scan on tabtm (cost=0.00..1.01 rows=1 width=8) (actual time=0.005..0.006 rows=1 loops=1)
  34. -> Parallel Append (cost=0.00..16757.50 rows=6 width=12) (actual time=59.127..81.720 rows=0 loops=3)
  35. -> Parallel Seq Scan on ptab02_202001 ptab02_2 (cost=0.00..3417.41 rows=1 width=12) (actual time=5.747..39.632 rows=0 loops=2)
  36. Filter: (tm = $0)
  37. Rows Removed by Filter: 133920
  38. -> Parallel Seq Scan on ptab02_202003 ptab02_4 (cost=0.00..3417.41 rows=1 width=12) (actual time=45.304..45.304 rows=0 loops=1)
  39. Filter: (tm = $0)
  40. Rows Removed by Filter: 267840
  41. -> Parallel Seq Scan on ptab02_202005 ptab02_6 (cost=0.00..3417.41 rows=1 width=12) (actual time=21.245..21.245 rows=0 loops=2)
  42. Filter: (tm = $0)
  43. Rows Removed by Filter: 133920
  44. -> Parallel Seq Scan on ptab02_202004 ptab02_5 (cost=0.00..3307.88 rows=1 width=12) (actual time=60.385..60.385 rows=0 loops=1)
  45. Filter: (tm = $0)
  46. Rows Removed by Filter: 259200
  47. -> Parallel Seq Scan on ptab02_202002 ptab02_3 (cost=0.00..3197.35 rows=1 width=12) (actual time=17.671..17.671 rows=0 loops=1)
  48. Filter: (tm = $0)
  49. Rows Removed by Filter: 250560
  50. -> Parallel Seq Scan on ptab02 ptab02_1 (cost=0.00..0.00 rows=1 width=12) (actual time=0.000..0.001 rows=0 loops=1)
  51. Filter: (tm = $0)
  52. Planning Time: 0.171 ms
  53. Execution Time: 107.431 ms
  54. (26 rows)

从上面可以看到声明式分区可以做动态的分区裁剪,不需要扫描的分区后面会有“(never executed)”的提示,表示把这些分区表都跳过了。