HPvertica收集表的统计信息方法介绍
一.通过PROJECTION_COLUMNS的statistics_type查看表的列的统计信息类型,和收集时间
SELECT projection_name,projection_column_name,
statistics_type,statistics_updated_timestamp,
column_position
FROM PROJECTION_COLUMNS
where table_schema='FDL' and table_name = 'FDL_T01_XXX';
--statistics_type最好为full
二.通过PROJECTIONS的HAS_STATISTICS判断表是否有统计信息
ELECT projection_basename,projection_name
segment_expression,segment_range,verified_fault_tolerance,
HAS_STATISTICS,is_segmented
FROM PROJECTIONS
where projection_schema='FDL' and projection_basename='FDL_T01_XXX';
--同样可以查看分段的类型,分布
三.通过PROJECTION_STORAGE的row_count,used_bytes等统计信息,判断统计信息是否准确
select NODE_NAME,projection_name,projection_column_count,
row_count,used_bytes
from PROJECTION_STORAGE
where ANCHOR_TABLE_SCHEMA='FDL' and ANCHOR_TABLE_NAME='FDL_T01_XXX'
order by projection_name,node_name;
四.收集单表统计信息
select ANALYZE_STATISTICS('FDL.FDL_T01_XXX');
五.批量收集表的统计信息
select 'select ANALYZE_STATISTICS('''||(table_schema||'.'||table_name)||''');' from tables where table_schema='FDL' order by table_name;
注1.关于统计信息
Number of rows in the table
Number of distinct values of each column (cardinality)
Minimum/maximum values of each column
Ahistogramof the distribution of values in each column
Disk space that the column occupie
注2.关于执行计划优化器
The access path with the fewest expected I/O operations and lowest CPU, memory, and network usage
Multiple eligibleprojectionsto answer the query
Join types of differing algorithms (hash join/merge join or group by hash/group by pipelined)
The order in which to perform joins
Query predicates
Data re-distribution algorithms (broadcast and re-segmentation) across nodes in the cluster
下一篇: 2019 蓝桥杯省赛 B 组模拟赛(一)