本文共 16235 字,大约阅读时间需要 54 分钟。
PostgreSQL , 优化器 , 成本因子 , 校对
最近在写一个PostgreSQL数据库培训PPT, 其中有explain的讲解需求, 刚刚接触PostgreSQL的童鞋对PostgreSQL的explain输出中cost的部分可能会一头雾水, 看不懂cost的值和SQL实际运行时间有什么联系.
为了让大家能更加深刻的了解explain, 我打算对explain的cost常量做一次校准, 这个校准的灵感来自天文望远镜赤道仪的校准. PostgreSQL cost常量在校准后, explain 的输出cost就会非常接近真实执行的时间.
接下来我们就来看一看如何校准PostgreSQL COST常量.
在赤道仪的校准中, 要用到北极星以及至少一颗已知赤经赤纬的亮星.
同样, 在PostgreSQL COST的校准中, 要用到的是已知算法以及真实的数据. 利用真实的数据和公式, 求出未知数, 达到校准的目的.
已知的数据可以来自硬件厂商或者自行测试得到, 已知的cost值算法则可以参考PostgreSQL手册或源代码.
src/backend/optimizer/path/costsize.c
PostgreSQL的cost常量如下 :
seq_page_cost -- 连续块扫描操作的单个块的cost. 例如全表扫描 random_page_cost -- 随机块扫描操作的单个块的cost. 例如索引扫描 cpu_tuple_cost -- 处理每条记录的CPU开销 cpu_index_tuple_cost -- 扫描每个索引条目带来的CPU开销 cpu_operator_cost -- 操作符或函数带来的CPU开销.(需要注意函数以及操作符对应的函数的三态, 执行计划会根据三态做优化, 关系到多条记录时三态对应的调用次数是需要关心的)
接下来举例说明如何校对这几个常量.
创建测试表
digoal=# create table tbl_cost_align (id int, info text, crt_time timestamp); CREATE TABLE
插入测试数据, 这里插入的ID为随机数, 这样的话可以使得我们后面要做的离散IO请求测试更准确一些.
digoal=# insert into tbl_cost_align select (random()*2000000000)::int, md5(random()::text), clock_timestamp() from generate_series(1,100000); INSERT 0 100000 digoal=# insert into tbl_cost_align select (random()*2000000000)::int, md5(random()::text), clock_timestamp() from generate_series(1,10000000); INSERT 0 10000000
分析表, 获得统计信息
digoal=# analyze tbl_cost_align; ANALYZE
可以查看到占用的数据块个数
digoal=# select relpages from pg_class where relname='tbl_cost_align'; relpages ---------- 94393 (1 row)
执行checkpoint后关闭数据库, 为了得到一个纯粹的物理磁盘的连续io请求的cost常量, 不能有shared buffer的干扰.
digoal=# checkpoint; CHECKPOINT pg93@db-172-16-3-150-> pg_ctl stop -m fast waiting for server to shut down.... done server stopped
同时还不能有OS Cache的干扰, 所以要清理操作系统cache.
[root@db-172-16-3-150 ssd1]# sync; echo 3 > /proc/sys/vm/drop_caches
前面我们说了, 有些指标可以通过硬件厂商得到或者自行测试得到, 那么这里我们就要自己测试得到.
测试方法比较多, 本文将通过systemtap来得到每次IO请求的时间.
为了降低systemtap带来的额外开销, 请参考 :
指定亲和1, 启动数据库 :
pg93@db-172-16-3-150-> taskset -c 1 /home/pg93/pgsql9.3.1/bin/postgres >/dev/null 2>&1
开启psql
pg93@db-172-16-3-150-> psql psql (9.3.1) Type "help" for help. digoal=# select pg_backend_pid(); pg_backend_pid ---------------- 5727 (1 row)
指定亲和7, 开启stap, 收集postgres进程相关的io信息.
[root@db-172-16-3-150 ~]# taskset -c 7 stap -e ' global a probe process("/home/pg93/pgsql9.3.1/bin/postgres").mark("query__start") { delete a println("query__start ", user_string($arg1), "pid:", pid()) } probe vfs.read.return { t = gettimeofday_ns() - @entry(gettimeofday_ns()) # if (execname() == "postgres" && devname != "N/A") a[pid()] <<< t } probe process("/home/pg93/pgsql9.3.1/bin/postgres").mark("query__done") { if (@count(a[pid()])) printdln("**", pid(), @count(a[pid()]), @avg(a[pid()])) println("query__done ", user_string($arg1), "pid:", pid()) if (@count(a[pid()])) { println(@hist_log(a[pid()])) #println(@hist_linear(a[pid()],1024,4096,100)) } delete a }' -x 5727
接下来在psql中执行explain analyze, 在explain的结果中可以得到一个值, 实际的执行时间(3260.695 -0.839).
并且可以得到原始的cost(195393.00), 这个原始的cost有助于验证公式是否正确.
digoal=# explain (analyze,verbose,costs,buffers,timing) select * from tbl_cost_align; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------ ---- Seq Scan on postgres.tbl_cost_align (cost=0.00..195393.00 rows=10100000 width=45) (actual time=0.839..3260.695 rows=10100000 loops =1) Output: id, info, crt_time Buffers: shared read=94393 -- 注意这个read指的是未命中shared buffer, 如果是命中的话会有hit=? Total runtime: 4325.885 ms (4 rows)
执行完explain之后, 在stap输出中得到了我们想要的平均IO响应时间信息(14329).
query__start explain (analyze,verbose,costs,buffers,timing) select * from tbl_cost_align;pid:5727 5727**94417**14329 query__done explain (analyze,verbose,costs,buffers,timing) select * from tbl_cost_align;pid:5727 value |-------------------------------------------------- count 1024 | 0 2048 | 0 4096 | 153 8192 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 86293 16384 |@ 1864 32768 | 116 65536 |@@@ 5918 131072 | 59 262144 | 7 524288 | 3 1048576 | 2 2097152 | 2 4194304 | 0 8388608 | 0
收集到以上数据后, 首先验证公式的正确性. 验证公式前, 需要解读explain的输出. 以及现有的2个常量.
digoal=# show seq_page_cost; seq_page_cost --------------- 1 (1 row) digoal=# show cpu_tuple_cost; cpu_tuple_cost ---------------- 0.01 (1 row)
公式正确 :
195393 = (shared read=)94393*1(seq_page_cost) + (rows=)10100000*0.01(cpu_tuple_cost) digoal=# select 94393+10100000*0.01; ?column? ----------- 195393.00 (1 row)
那么从stap中我们得到io的平均响应时间是14329纳秒(0.014329毫秒). 真实的执行时间是(3260.695 -0.839).
套用到公式中.
3260.695 -0.839 = 94393*0.014329 + 10100000*x x = 0.00018884145574257426
接下来要做的是调整seq_page_cost和cpu_tuple_cost, 重新执行SQL.
digoal=# set seq_page_cost=0.014329; SET digoal=# set cpu_tuple_cost=0.00018884145574257426; SET
重新执行SQL后, 我们看到评估出来的COST是3259.86, 实际的执行时间是1599.507ms.
不一致是因为现在用到了shared buffer, 已经没有直接读硬盘或者OS CACHE了.
digoal=# explain (analyze,verbose,costs,buffers,timing) select * from tbl_cost_align; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------ -- Seq Scan on postgres.tbl_cost_align (cost=0.00..3259.86 rows=10100000 width=45) (actual time=0.011..1599.507 rows=10100000 loops=1 ) Output: id, info, crt_time Buffers: shared hit=94393 Total runtime: 2617.152 ms (4 rows)
可以重启数据库并清除CACHE来测试, 一定会得到满意的答案.
pg93@db-172-16-3-150-> taskset -c 1 /home/pg93/pgsql9.3.1/bin/postgres >/dev/null 2>&1 [root@db-172-16-3-150 ~]# sync; echo 3 > /proc/sys/vm/drop_caches digoal=# set seq_page_cost=0.014329; SET digoal=# set cpu_tuple_cost=0.00018884145574257426; SET digoal=# explain (analyze,verbose,costs,buffers,timing) select * from tbl_cost_align; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------ -- Seq Scan on postgres.tbl_cost_align (cost=0.00..3259.86 rows=10100000 width=45) (actual time=0.915..3318.443 rows=10100000 loops=1 ) Output: id, info, crt_time Buffers: shared read=94393 Total runtime: 4380.828 ms (4 rows)
现在是完全从硬盘读取, 所以得出的cost就和实际执行时间相当接近了3259.86 VS (3318.443 - 0.915).
random_page_cost 本文还是通过stap跟踪来获得.
cpu_index_tuple_cost 和 cpu_operator_cost 两个未知数不是很好推算, 基本上出现cpu_index_tuple_cost 的场景, 另一个cpu_operator_cost 也出现了, 所以2个未知数都是同时出现.
那么我们只能给他们来个比例. 或者能够直接跟踪到其中的一个未知数, 才能得出另一个未知数.
本文利用cpu_index_tuple_cost 和 cpu_operator_cost的默认占比来求得这两个值.
首先我们还是要确定公式, 为了方便公式验证, 把所有的常量都设置为1.
digoal=# set random_page_cost=1; SET digoal=# set cpu_tuple_cost=1; SET digoal=# set cpu_index_tuple_cost=1; SET digoal=# set cpu_operator_cost=1; SET digoal=# set enable_seqscan=off; set enable_bitmapscan=off; explain (analyze,verbose,costs,buffers,timing) select * from tbl_cost_align where id>1998999963; SET SET QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------ ------------------------- Index Scan using idx_tbl_cost_align_id on postgres.tbl_cost_align (cost=174.00..20181.67 rows=5031 width=45) (actual time=0.029..1 7.773 rows=5037 loops=1) Output: id, info, crt_time Index Cond: (tbl_cost_align.id > 1998999963) Buffers: shared hit=5054 Total runtime: 18.477 ms (5 rows)
执行计划表明这是个索引扫描, 至于扫了多少个数据块是未知的, 索引的tuples也是未知的, 已知的是cost和rows.
20181.67 = blocks*random_page_cost + cpu_tuple_cost*5031 + cpu_index_tuple_cost*5031 + cpu_operator_cost*?
求这个问号, 可以通过更改cpu_operator_cost来得到.
digoal=# set cpu_operator_cost=2; SET digoal=# set enable_seqscan=off; set enable_bitmapscan=off; explain (analyze,verbose,costs,buffers,timing) select * from tbl_cost_align where id>1998999963; SET SET QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------ ------------------------ Index Scan using idx_tbl_cost_align_id on postgres.tbl_cost_align (cost=348.00..25386.67 rows=5031 width=45) (actual time=0.013..5 .785 rows=5037 loops=1) Output: id, info, crt_time Index Cond: (tbl_cost_align.id > 1998999963) Buffers: shared hit=5054 Total runtime: 6.336 ms (5 rows)
25386.67-20181.67 = 5205 得到本例通过索引扫描的条数. 等式就变成了 20181.67 = blocks*random_page_cost + cpu_tuple_cost*5031 + cpu_index_tuple_cost*5031 + cpu_operator_cost*5205
接下来要求blocks, 也就是扫描的随机页数.
通过调整random_page_cost得到.
digoal=# set random_page_cost = 2; SET digoal=# set enable_seqscan=off; set enable_bitmapscan=off; explain (analyze,verbose,costs,buffers,timing) select * from tbl_cost_align where id>1998999963; SET SET QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------ ------------------------ Index Scan using idx_tbl_cost_align_id on postgres.tbl_cost_align (cost=348.00..30301.33 rows=5031 width=45) (actual time=0.013..5 .778 rows=5037 loops=1) Output: id, info, crt_time Index Cond: (tbl_cost_align.id > 1998999963) Buffers: shared hit=5054 Total runtime: 6.331 ms (5 rows)
30301.33-25386.67 = 4914.66
得到blocks = 4914.66.
更新等式 :
20181.67 = 4914.66*random_page_cost + cpu_tuple_cost*5031 + cpu_index_tuple_cost*5031 + cpu_operator_cost*5205
接下来要做的是通过stap统计出random_page_cost.
pg93@db-172-16-3-150-> taskset -c 1 /home/pg93/pgsql9.3.1/bin/postgres >/dev/null 2>&1 [root@db-172-16-3-150 ~]# sync; echo 3 > /proc/sys/vm/drop_caches digoal=# select pg_backend_pid(); pg_backend_pid ---------------- 10009 (1 row) [root@db-172-16-3-150 ~]# taskset -c 2 stap -e ' global a probe process("/home/pg93/pgsql9.3.1/bin/postgres").mark("query__start") { delete a println("query__start ", user_string($arg1), "pid:", pid()) } probe vfs.read.return { t = gettimeofday_ns() - @entry(gettimeofday_ns()) # if (execname() == "postgres" && devname != "N/A") a[pid()] <<< t } probe process("/home/pg93/pgsql9.3.1/bin/postgres").mark("query__done") { if (@count(a[pid()])) printdln("**", pid(), @count(a[pid()]), @avg(a[pid()])) println("query__done ", user_string($arg1), "pid:", pid()) if (@count(a[pid()])) { println(@hist_log(a[pid()])) #println(@hist_linear(a[pid()],1024,4096,100)) } delete a }' -x 10009 digoal=# set enable_seqscan=off; set enable_bitmapscan=off; explain (analyze,verbose,costs,buffers,timing) select * from tbl_cost_align where id>1998999963; SET SET QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------ ------------------------ Index Scan using idx_tbl_cost_align_id on postgres.tbl_cost_align (cost=0.43..5003.15 rows=5031 width=45) (actual time=0.609..1844 .415 rows=5037 loops=1) Output: id, info, crt_time Index Cond: (tbl_cost_align.id > 1998999963) Buffers: shared hit=152 read=4902 Total runtime: 1846.683 ms (5 rows) query__start explain (analyze,verbose,costs,buffers,timing) select * from tbl_cost_align where id>1998999963;pid:10009 10009**4946**368362 query__done explain (analyze,verbose,costs,buffers,timing) select * from tbl_cost_align where id>1998999963;pid:10009 value |-------------------------------------------------- count 2048 | 0 4096 | 0 8192 | 33 16384 | 2 32768 | 6 65536 | 4 131072 |@@@@@@@@@@@@@@@@@@@ 1193 262144 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 2971 524288 |@@@@@@@@@@@@ 729 1048576 | 2 2097152 | 5 4194304 | 0 8388608 | 1 16777216 | 0 33554432 | 0
更新等式, 使用时间等式 :
等式1 :
1844.415 = 4914.66*0.368362 + 0.00018884145574257426*5031 + cpu_index_tuple_cost*5031 + cpu_operator_cost*5205
cpu_tuple_cost用例子1中计算得到的0.00018884145574257426
cpu_index_tuple_cost和cpu_operator_cost的比例用系统默认的2 : 1.
等式2 :
cpu_index_tuple_cost/cpu_operator_cost = 2
最终得到 :
cpu_index_tuple_cost = 0.00433497085216479990 cpu_operator_cost = 0.00216748542608239995
结合例子1 得到的两个常量, 所有的5个常量值就调整好了.
digoal=# set cpu_tuple_cost=0.00018884145574257426; SET digoal=# set cpu_index_tuple_cost = 0.00433497085216479990; SET digoal=# set cpu_operator_cost = 0.00216748542608239995; SET digoal=# set seq_page_cost=0.014329; SET digoal=# set random_page_cost = 0.368362; SET digoal=# set enable_seqscan=off; set enable_bitmapscan=off; explain (analyze,verbose,costs,buffers,timing) select * from tbl_cost_align where id>1998999963; SET SET QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------ ------------------------ Index Scan using idx_tbl_cost_align_id on postgres.tbl_cost_align (cost=0.38..1844.42 rows=5031 width=45) (actual time=0.553..1346 .468 rows=5037 loops=1) Output: id, info, crt_time Index Cond: (tbl_cost_align.id > 1998999963) Buffers: shared hit=152 read=4902 Total runtime: 1348.428 ms (5 rows)
以后使用调整后的cost常量, 就可以估算出SQL的真实执行时间, 真实执行时间会因为shared buffer hit以及os cache比explain得到的值略短, 但是已经非常接近了.
digoal=# set enable_seqscan=off; set enable_bitmapscan=off; explain select * from tbl_cost_align where id>1998999963; SET SET QUERY PLAN --------------------------------------------------------------------------------------------------- Index Scan using idx_tbl_cost_align_id on tbl_cost_align (cost=0.38..1844.42 rows=5031 width=45) Index Cond: (id > 1998999963) (2 rows)
1.
2.
3.
4.
5.
6.
7.
8.
9. src/backend/optimizer/path/costsize.c
转载地址:http://uowsx.baihongyu.com/