数据库中clusterfactor对执行计划的影响是什么

这篇文章主要讲解了“数据库中cluster factor对执行计划的影响是什么”，文中的讲解内容简单清晰，易于学习与理解，下面请大家跟着小编的思路慢慢深入，一起来研究和学习“数据库中cluster factor对执行计划的影响是什么”吧！

我们提供的服务有：网站建设、网站设计、微信公众号开发、网站优化、网站认证、英山ssl等。为上1000+企事业单位解决了网站和推广的问题。提供周到的售前咨询和贴心的售后服务，是有科学管理、有技术的英山网站制作公司

cluster factor对执行计划的影响

测试环境：Linux 7.6 + ORACLE 19.6.1

1.创建测试环境

1.1 创建测试表并插入数据

CZH@czhpdb > create table test_ffs as select * from hr.employees;
 
Table created.
 
CZH@czhpdb > insert into test_ffs select * from test_ffs;
 
Execution Plan
----------------------------------------------------------
Plan hash value: 296244252
 
---------------------------------------------------------------------------------------------
| Id  | Operation                        | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT                 |          |   107 |  7383 |     3   (0)| 00:00:01 |
|   1 |  LOAD TABLE CONVENTIONAL         | TEST_FFS |       |       |            |          |
|   2 |   OPTIMIZER STATISTICS GATHERING |          |   107 |  7383 |     3   (0)| 00:00:01 |
|   3 |    TABLE ACCESS FULL             | TEST_FFS |   107 |  7383 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------
 
Note
-----
   - dynamic statistics used: statistics for conventional DML
 
 
Statistics
----------------------------------------------------------
         72  recursive calls
         89  db block gets
         81  consistent gets
         12  physical reads
      21576  redo size
        195  bytes sent via SQL*Net to client
        394  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          3  sorts (memory)
          0  sorts (disk)
        107  rows processed

上面autotrace执行计划可以看到两个新特性：

1.2 12c R1与19c两个新特性

1.2.1 12c R1新特性OPTIMIZER STATISTICS GATHERING：

# OPTIMIZER STATISTICS GATHERING：12cR1以后的新特性，direct path load时，空表第一次加载数据时会自动收集统计信息。

# Oracle Database 12c introduced online statistics gathering for CREATE TABLE AS SELECT statements and direct-path inserts.

1.2.2 19c新特性real-time statistics

Oracle Database 19c introduces real-time statistics
, which extend online support to conventional DML statements
. Because statistics can go stale between DBMS_STATS jobs, real-time statistics helps the optimizer generate more optimal plans.Whereas bulk load operations gather all necessary statistics, real-time statistics augment rather than replace traditional statistics.

· Oracle introduced new parameters

· "_optimizer_gather_stats_on_conventional_dml" and "_optimizer_use_stats_on_conventional_dml" which are true by default

· "_optimizer_stats_on_conventional_dml_sample_rate" at 100%

· How does real time statistics works?

· By default the "_optimizer_gather_stats_on_conventional_dml" is true so its automatically kicks off

· When a DML operation is currently modifying a table (conventional), Oracle Database dynamically computes values for the most essential statistics if the above parameter is on.

· Consider a example of table that is having lot of inserts and rows are increasing. Real-time statistics keep track of the increasing row count as rows are being inserted. If the optimizer performs a hard parse of a new query, then the optimizer can use the real-time statistics to obtain a more accurate cost estimate.

· DBA_TAB_COL_STATISTICS and DBA_TAB_STATISITICS has columns NOTES tell real time statistics have been used. STATS_ON_CONVENTIONAL_DML

SELECT NVL(PARTITION_NAME, 'GLOBAL') PARTITION_NAME, NUM_ROWS, BLOCKS, NOTES

FROM   USER_TAB_STATISTICS

WHERE  TABLE_NAME = 'SALES'

ORDER BY 1, 4;

PARTITION_NAM   NUM_ROWS     BLOCKS NOTES

------------- ---------- ---------- -------------------------

GLOBAL           1837686       3315 STATS_ON_CONVENTIONAL_DML

1.3 插入大量数据并收集统计信息

CZH@czhpdb > set autot off
CZH@czhpdb > insert into test_ffs select * from test_ffs;
CZH@czhpdb > insert into test_ffs select * from test_ffs;
CZH@czhpdb > insert into test_ffs select * from test_ffs;
CZH@czhpdb > insert into test_ffs select * from test_ffs;
CZH@czhpdb > insert into test_ffs select * from test_ffs;
CZH@czhpdb > insert into test_ffs select * from test_ffs;
CZH@czhpdb > insert into test_ffs select * from test_ffs;
CZH@czhpdb > insert into test_ffs select * from test_ffs;
CZH@czhpdb > insert into test_ffs select * from test_ffs;
CZH@czhpdb > insert into test_ffs select * from test_ffs;
CZH@czhpdb > commit;
 
CZH@czhpdb > CREATE INDEX IDX_TEST_FFS ON TEST_FFS(EMPLOYEE_ID);
 
CZH@czhpdb > EXEC DBMS_STATS.GATHER_TABLE_STATS(user,’TEST_FFS’,cascade=>true);

1.4 使用Hint /+ gather_plan_statistics /获取sql真实执行计划

# sqlplus中set autotrace与explain plan for都是CBO预估出来的执行计划，可能与真实执行的并不相同，我们使用下面hint获取真实执行计划。

CZH@czhpdb > SELECT /*+ gather_plan_statistics */ salary from test_ffs where employee_id < 100;
 
no rows selected
 
真实执行计划：
 
SYS@orcl2 > select * from table(dbms_xplan.display_cursor('c9qg9su5khysd',null,'allstats last'));
 
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  c9qg9su5khysd, child number 0
-------------------------------------
SELECT /*+ gather_plan_statistics */ salary from test_ffs where
employee_id < 100
 
Plan hash value: 296244252
 
----------------------------------------------------------------------------------------
| Id  | Operation         | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |          |      1 |        |      0 |00:00:00.01 |    2288 |
|*  1 |  TABLE ACCESS FULL| TEST_FFS |      1 |      1 |      0 |00:00:00.01 |    2288 |
----------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("EMPLOYEE_ID"<100)

# 可以看到由于表中数据没有employee_id < 100，我们认为明显走索引的sql并未选择索引，那肯定是由于某些原因，cbo认为走索引并不是最优执行路径，我们就利用10053获取为什么cbo认为全表扫描cost更低。

SYS@orcl2 > alter system flush shared_pool;
 
System altered.

#如果不清空shared_pool或者使游标失效，软解析开启10053事件，不会生成trace文件。

CZH@czhpdb > ALTER SESSION SET EVENTS '10053 trace name context forever,level 1';
 
Session altered.
 
CZH@czhpdb > SELECT /*+ gather_plan_statistics */ salary from test_ffs where employee_id < 100;
 
no rows selected
 
CZH@czhpdb > ALTER SESSION SET EVENTS '10053 trace name context off';
 
Session altered.

19c 10053：

# 可以从下面10053看到DK(distinct key),CLUF(clustering factor),IX_SEL,下一步将根据几个参数计算为何CBO认为走索引cost会高于全表扫描。

***************************************
BASE STATISTICAL INFORMATION
***********************
Table Stats::
  Table: TEST_FFS  Alias: TEST_FFSonline table stats for conventional DML (block count: 2263 row count: 219029)   used on (TEST_FFS) block count: 5 -> 2263, row count: 107 -> 219136
  #Rows: 219136  SSZ: 0  LGR: 0  #Blks:  2263  AvgRowLen:  69.00  NEB: 0  ChainCnt:  0.00  ScanRate:  0.00  SPC: 0  RFL: 0  RNF: 0  CBK: 0  CHR: 0  KQDFLG: 193
  #IMCUs: 0  IMCRowCnt: 0  IMCJournalRowCnt: 0  #IMCBlocks: 0  IMCQuotient: 0.000000
Index Stats::
  Index: IDX_TEST_FFS  Col#: 1
  LVLS: 1  #LB: 458  #DK: 107  LB/K: 4.00  DB/K: 1524.00  CLUF: 163174.00  NRW: 219136.00 SSZ: 0.00 LGR: 0.00 CBK: 0.00 GQL: 0.00 CHR: 0.00 KQDFLG: 8192 BSZ: 1
  KKEISFLG: 1 
try to generate single-table filter predicates from ORs for query block SEL$1 (#0)
finally: "TEST_FFS"."EMPLOYEE_ID"<100
 
=======================================
SPD: BEGIN context at query block level
=======================================
Query Block SEL$1 (#0)
Return code in qosdSetupDirCtx4QB: NOCTX
=====================================
SPD: END context at query block level
=====================================
Access path analysis for TEST_FFS
***************************************
SINGLE TABLE ACCESS PATH 
  Single Table Cardinality Estimation for TEST_FFS[TEST_FFS] 
  SPD: Return code in qosdDSDirSetup: NOCTX, estType = TABLE
 
 kkecdn: Single Table Predicate:"TEST_FFS"."EMPLOYEE_ID"<100
online column stats for conventional DML used on (TEST_FFS.EMPLOYEE_ID) min: 100.00 -> 100.00, max: 206.00 -> 206.00, nnl: 0 -> 0, acl: 4 -> 0 
  Column (#1): EMPLOYEE_ID(NUMBER)
    AvgLen: 22 NDV: 107 Nulls: 0 Density: 0.009346 Min: 100.000000 Max: 206.000000
  Using density: 0.009346 of col #1 as selectivity of unpopular value pred
  Table: TEST_FFS  Alias: TEST_FFS
    Card: Original: 219136.000000  Rounded: 2048  Computed: 2048.000000  Non Adjusted: 2048.000000
  Scan IO  Cost (Disk) =   615.000000
  Scan CPU Cost (Disk) =   49272938.720000
  Cost of predicates:
    io = NOCOST, cpu = 50.000000, sel = 0.009346 flag = 2048  ("TEST_FFS"."EMPLOYEE_ID"<100)
  Total Scan IO  Cost  =   615.000000 (scan (Disk))
                         + 0.000000 (io filter eval) (= 0.000000 (per row) * 219136.000000 (#rows))
                       =   615.000000
  Total Scan CPU  Cost =   49272938.720000 (scan (Disk))
                         + 10956800.000000 (cpu filter eval) (= 50.000000 (per row) * 219136.000000 (#rows))
                       =   60229738.720000
  Access Path: TableScan
    Cost:  621.167026  Resp: 621.167026  Degree: 0
      Cost_io: 615.000000  Cost_cpu: 60229739
      Resp_io: 615.000000  Resp_cpu: 60229739
 ****** Costing Index IDX_TEST_FFS
  SPD: Return code in qosdDSDirSetup: NOCTX, estType = INDEX_SCAN
  SPD: Return code in qosdDSDirSetup: NOCTX, estType = INDEX_FILTER
  Using density: 0.009346 of col #1 as selectivity of unpopular value pred
  Access Path: index (RangeScan)
    Index: IDX_TEST_FFS
    resc_io: 1531.000000  resc_cpu: 11906445
    ix_sel: 0.009346  ix_sel_with_filters: 0.009346 
    Cost: 1532.219121  Resp: 1532.219121  Degree: 1
  Best:: AccessPath: TableScan
         Cost: 621.167026  Degree: 1  Resp: 621.167026  Card: 2048.000000  Bytes: 0.000000
 
online column stats for conventional DML used on (TEST_FFS.SALARY) min: 2100.00 -> 2100.00, max: 24000.00 -> 24000.00, nnl: 0 -> 0, acl: 4 -> 0 
***************************************

2.调整cluster factor

2.1 cluster factor聚簇因子说明

cluster factor表示索引顺序与表存储数据一致性，顺序扫描索引时，如果索引键值扫描到键值对应的表数据行对应的数据块发生变化时，则cluster factor加1，所以cluster factor最低为表数据块，最大为表数据行，与表存储顺序高度相关，如果表是按照顺序插入，则cluster factor较低，如果表数据为无序插入，则cluster factor较高，这就是为什么同样表数据情况下，执行计划会有时候有差别的原因。

索引扫描成本公式：

INDEX ACCESS I/O COST=BLEVEL+CEIL(#LEAF_BLOCKS*IX_SEL)

TABLE_ACCESS I/O COST=CEIL(CLUSTERING_FACTOR*IX_SEL_WITH_FILTERS)

IX_SEL与IX_SEL_WITH_FILTERS为索引选择率与索引带谓词选择率，一般为1/(DISTINCT KEY)值，本例中走全表扫描时，IX_SEL=1/107=0.009345，则计算走索引成本为：

ACCESS INDEX COST=INDEX ACCESS I/O COST + TABLE ACCESS I/O COST=2+CEIL(458*0.009345)+CEIL(163174*0.009345)=1540

近似等于CBO预计出来的1532，是高于全表扫的COST 615的，所以选择走了全表扫描。

2.2 调整cluster factor

重建表，order by排序，降低cluster factor

CZH@czhpdb > create table test_ffs_03 as select * from test_ffs_02 order by employee_id;
 
Table created.
 
CZH@czhpdb > create index idx_test_ffs_03 on test_ffs_03(employee_id);
 
Index created.
 
CZH@czhpdb > select clustering_factor,index_name from user_indexes where index_name='IDX_TEST_FFS_03';
 
                       CLUSTERING_FACTOR INDEX_NAME
---------------------------------------- --------------------
                                    1128 IDX_TEST_FFS_03

# 可以看到cluster factor明显降低。

CZH@czhpdb > select /*+ gather_plan_statistics */ salary from test_ffs_03 where employee_id < 100;
 
no rows selected
 
SYS@orcl2 > select * from table(dbms_xplan.display_cursor('8fpk2b8vzn5y2',null,'allstats last'));
 
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  8fpk2b8vzn5y2, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ salary from test_ffs_03 where
employee_id < 100
 
Plan hash value: 704625359
 
--------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name            | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
--------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                 |      1 |        |      0 |00:00:00.01 |       2 |      1 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| TEST_FFS_03     |      1 |   1024 |      0 |00:00:00.01 |       2 |      1 |
|*  2 |   INDEX RANGE SCAN                  | IDX_TEST_FFS_03 |      1 |   1024 |      0 |00:00:00.01 |       2 |      1 |
--------------------------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - access("EMPLOYEE_ID"<100)

感谢各位的阅读，以上就是“数据库中cluster factor对执行计划的影响是什么”的内容了，经过本文的学习后，相信大家对数据库中cluster factor对执行计划的影响是什么这一问题有了更深刻的体会，具体使用情况还需要大家实践验证。这里是创新互联，小编将为大家推送更多相关知识点的文章，欢迎关注！

网站栏目：数据库中clusterfactor对执行计划的影响是什么
文章网址：http://bjjierui.cn/article/jpioge.html

定制建站费用3500元

品牌网站建设费用6000元

商城网站建设费用8000元

手机微信网站建站3000元

建站知识

数据库中clusterfactor对执行计划的影响是什么

cluster factor对执行计划的影响

1.创建测试环境

1.1 创建测试表并插入数据

1.2 12c R1与19c两个新特性

1.2.1 12c R1新特性OPTIMIZER STATISTICS GATHERING：

1.2.2 19c新特性real-time statistics

1.3 插入大量数据并收集统计信息

1.4 使用Hint /+ gather_plan_statistics /获取sql真实执行计划

2.调整cluster factor

2.1 cluster factor聚簇因子说明

2.2 调整cluster factor

其他资讯

定制建站费用3500元

品牌网站建设费用6000元

商城网站建设费用8000元

手机微信网站建站3000元

建站知识

数据库中clusterfactor对执行计划的影响是什么

cluster factor对执行计划的影响

1.创建测试环境

1.1 创建测试表并插入数据

1.2 12c R1与19c两个新特性

1.2.1 12c R1新特性OPTIMIZER STATISTICS GATHERING：

1.2.2 19c新特性real-time statistics

1.3 插入大量数据并收集统计信息

1.4 使用Hint /*+ gather_plan_statistics */获取sql真实执行计划

2.调整cluster factor

2.1 cluster factor聚簇因子说明

2.2 调整cluster factor

其他资讯

1.4 使用Hint /+ gather_plan_statistics /获取sql真实执行计划