MySQL分区表(理论+实战)
参考
https://dev.mysql.com/doc/refman/5.7/en/create-table.html#create-table-partitioning
分类
1.HASH
哈希一个或多个列,以创建用于放置和定位行的键。expr是一个使用一个或多个表列的表达式。这可以是任何有效的MySQL表达式(包括MySQL函数),生成一个整数值。例如,它们都是使用分区BY散列的有效CREATE TABLE语句
CREATE TABLE t1 (col1 INT, col2 CHAR(5))
PARTITION BY HASH(col1);
CREATE TABLE t1 (col1 INT, col2 CHAR(5), col3 DATETIME)
PARTITION BY HASH ( YEAR(col3) );
2.KEY
这类似于散列,只是MySQL提供了散列函数,以保证数据的均匀分布。column_list参数只是一个包含1个或多个表列的列表(最多16个)。这个例子展示了一个简单的按键分区的表,有4个分区
CREATE TABLE tk (col1 INT, col2 CHAR(5), col3 DATE)
PARTITION BY KEY(col3)
PARTITIONS 4;
3.RANGE
在本例中,expr使用一组小于操作符的值显示一个值范围。在使用范围分区时,必须使用小于的值定义至少一个分区。不能在范围分区中使用值
CREATE TABLE t1 (
year_col INT,
some_data INT
)
PARTITION BY RANGE (year_col) (
PARTITION p0 VALUES LESS THAN (1991),
PARTITION p1 VALUES LESS THAN (1995),
PARTITION p2 VALUES LESS THAN (1999),
PARTITION p3 VALUES LESS THAN (2002),
PARTITION p4 VALUES LESS THAN (2006),
PARTITION p5 VALUES LESS THAN MAXVALUE
);
4.LIST
这在根据表列(可能的值的限制集,例如州或国家代码)分配分区时非常有用。在这种情况下,可以将属于某个州或国家的所有行分配给单个分区,或者为某个州或国家集保留一个分区。它与RANGE类似,只是可以使用IN中的值为每个分区指定允许的值。值IN与要匹配的值列表一起使用。例如,你可以创建一个分区方案如下
CREATE TABLE client_firms (
id INT,
name VARCHAR(35)
)
PARTITION BY LIST (id) (
PARTITION r0 VALUES IN (1, 5, 9, 13, 17, 21),
PARTITION r1 VALUES IN (2, 6, 10, 14, 18, 22),
PARTITION r2 VALUES IN (3, 7, 11, 15, 19, 23),
PARTITION r3 VALUES IN (4, 8, 12, 16, 20, 24)
);
说明
一般来说生产系统中大多以时间作为分区,所以本文仅针对range分区进行详细介绍
实战
准备测试表和测试数据
drop table if exists test_table;
create table if not exists test_table (
data_date date,
json_str json
);
insert into test_table values('2019-01-01','{"name":"xavier","age":"18","gender":"male"}');
insert into test_table values('2019-01-02','{"name":"xavier","age":"19","gender":"male"}');
insert into test_table values('2019-01-02','{"name":"xavier","age":"20","gender":"male"}');
insert into test_table values('2019-01-03','{"name":"xavier","age":"21","gender":"male"}');
insert into test_table values('2019-01-03','{"name":"xavier","age":"21","gender":"male"}');
insert into test_table values('2019-01-03','{"name":"xavier","age":"22","gender":"male"}');
insert into test_table values('2019-01-04','{"name":"xavier","age":"23","gender":"male"}');
insert into test_table values('2019-01-04','{"name":"xavier","age":"24","gender":"male"}');
insert into test_table values('2019-01-04','{"name":"xavier","age":"25","gender":"male"}');
insert into test_table values('2019-01-04','{"name":"xavier","age":"26","gender":"male"}');
insert into test_table values('2019-01-05','{"name":"xavier","age":"27","gender":"male"}');
insert into test_table values('2019-01-05','{"name":"xavier","age":"28","gender":"male"}');
insert into test_table values('2019-01-05','{"name":"xavier","age":"29","gender":"male"}');
insert into test_table values('2019-01-05','{"name":"xavier","age":"30","gender":"male"}');
insert into test_table values('2019-01-05','{"name":"xavier","age":"31","gender":"male"}');
查看普通表的执行计划
explain select * from test_table;
可以发现没有分区,且全表扫描,接着创建时间分区表和测试数据(注意:分区表一定要预先创建好)
drop table if exists range_table;
create table if not exists range_table (
data_date datetime,
json_str json
)
partition by range (to_days(data_date)) (
partition p_20190101 values less than (to_days('2019-01-02')),
partition p_20190102 values less than (to_days('2019-01-03')),
partition p_20190103 values less than (to_days('2019-01-04')),
partition p_20190104 values less than (to_days('2019-01-05')),
partition p_maxvalue values less than maxvalue -- 容错
);
insert into range_table values('2019-01-01','{"name":"xavier","age":"18","gender":"male"}');
insert into range_table values('2019-01-02','{"name":"xavier","age":"19","gender":"male"}');
insert into range_table values('2019-01-02','{"name":"xavier","age":"20","gender":"male"}');
insert into range_table values('2019-01-03','{"name":"xavier","age":"21","gender":"male"}');
insert into range_table values('2019-01-03','{"name":"xavier","age":"21","gender":"male"}');
insert into range_table values('2019-01-03','{"name":"xavier","age":"22","gender":"male"}');
insert into range_table values('2019-01-04','{"name":"xavier","age":"23","gender":"male"}');
insert into range_table values('2019-01-04','{"name":"xavier","age":"24","gender":"male"}');
insert into range_table values('2019-01-04','{"name":"xavier","age":"25","gender":"male"}');
insert into range_table values('2019-01-04','{"name":"xavier","age":"26","gender":"male"}');
查看分区表的执行计划
explain select * from range_table;
此时就发现已经有了分区信息,说明分区已经生效,那比如我们查询‘2019-01-03’那天的数据
explain select * from range_table where data_date='2019-01-03';
可以发现直接锁定到了分区p_20190103中并且总记录是3条,很明显分区的优势已经体现出来了,不需要再进行全表扫描了
补充
因为这个分区是预先定义好的不会动态增加,所以需要一条SQL用来专门增加分区
alter table range_table reorganize partition p_maxvalue into (
partition p_20190105 values less than (to_days('2019-01-06')),partition p_maxvalue values less than (maxvalue)
);
更多详细分区信息如下
select table_name,partition_name,subpartition_name,partition_ordinal_position,partition_method,partition_expression,partition_description,table_rows,avg_row_length,data_length,create_time,update_time
from information_schema.partitions where table_schema='test' and table_name = 'range_table';