项目上线后，谈一下感触比较深的一点：查询优化

程序员文章站 2022-04-09 15:56:39

前言开心一刻儿子有道题不会做，喊我过去教他。我推了推一旁的老公：我头疼，你去吧。老公不动，我：零花钱涨一千。话音刚落，老公就屁颠屁颠跑去儿子房间。进去不到几分钟，一声怒吼伴随着儿子的哭声传来的瞬间，老公从儿子房间出来，边走边说：“朽木不可雕也。”儿子从房间探出半个身子，一脸委屈：“爸爸也不会做， ......

前言

　　开心一刻

　　　　儿子有道题不会做，喊我过去教他。我推了推一旁的老公：我头疼，你去吧。老公不动，我：零花钱涨一千。话音刚落，老公就屁颠屁颠跑去儿子房间。进去不到几分钟，一声怒吼伴随着儿子的哭声传来的瞬间，老公从儿子房间出来，边走边说：“朽木不可雕也。”儿子从房间探出半个身子，一脸委屈：“爸爸也不会做，他说给我一块钱，让我明天早点去学校抄同学的。还让我不要告诉你，我不肯，他就吼我。”

项目上线后，谈一下感触比较深的一点：查询优化

问题背景

　　前段时间，被紧急调到一个新项目，支撑新项目的开发。跌跌撞撞之下，项目也正常上线了，期间收获颇多，无论是业务上的，还是业务之外的。业务上的就不多说了，不具通用性，意义不大，有一点业务之外的东东给我的感触比较深，特记录下来，与大家分享下：查询优化

查询优化

　　完整示例工程：，包括数据库表的 ddl 和 dml，以及数据批量的生成

　　相关表

　　　　涉及的表不多，一共三张：额度表、记录表、存款表

　　　　额度表 t_custmor_credit

create table t_customer_credit (
  id int(11) unsigned not null auto_increment comment '自增主键',
  login_name varchar(50) not null comment '名称',
    credit_type tinyint(1) not null comment '额度类型，1:*资金,2:冻结资金,3:优惠',
    amount decimal(22,6) not null default '0.00000' comment '额度值',
    create_by varchar(50) not null comment '创建者',
    create_time datetime not null default current_timestamp comment '创建时间',
    update_time datetime not null default current_timestamp on update current_timestamp comment '创建时间',
    update_by varchar(50) not null comment '修改者',
  primary key (id)
);

　　　　　　记录每个顾客的当前额度，额度一共分三种：*资金、冻结资金和优惠，也就是说每个顾客会有 3 条记录来表示他的各个额度。表中数据如下

项目上线后，谈一下感触比较深的一点：查询优化

　　　　额度记录 t_custmor_credit_record

create table t_customer_credit_record (
  id int(11) unsigned not null auto_increment comment '自增主键',
  login_name varchar(50) not null comment '名称',
    credit_type tinyint(1) not null comment '额度类型，参考t_custmor_credit的credit_type',
    bill_no varchar(50) not null comment '订单号',
    amount_before decimal(22,6) not null default '0.00000' comment '前额度值',
    amount_change decimal(22,6) not null default '0.00000' comment '额度变化值',
    amount_after decimal(22,6) not null default '0.00000' comment '后额度值',
    create_by varchar(50) not null comment '创建者',
    create_time datetime not null default current_timestamp comment '创建时间',
    remark varchar(500) not null default '' comment '备注',
  primary key (id)
);

　　　　　　记录顾客额度的每一次变化，只要有额度变化（不管是哪个额度进行了变化），都会新增3条记录，每个类型的额度都会新增一条记录。另外，该表只会有数据的插入，不会有数据的删、改。表中数据如下

项目上线后，谈一下感触比较深的一点：查询优化

　　　　存款表 t_custmor_deposit

create table t_customer_deposit (
  id int(11) unsigned not null auto_increment comment '自增主键',
  login_name varchar(50) not null comment '名称',
    bill_no varchar(50) not null comment '订单号',
    amount decimal(22,6) not null default '0.00000' comment '存款金额',
    deposit_state tinyint(1) not null comment '存款状态: 1成功,2失败,3未知',
    channal tinyint(2) not null comment '存款渠道: 1:银联,2支付宝,3微信',
    create_by varchar(50) not null comment '创建者',
    create_time datetime not null default current_timestamp comment '创建时间',
    remark varchar(500) not null default '' comment '备注',
  primary key (id)
);

　　　　　　记录顾客的每一次存款，该表只会有数据的插入，不会有数据的删、改。表中数据如下

项目上线后，谈一下感触比较深的一点：查询优化

　　产品需求

　　　　分页展示如下列表（具体的过滤条件就不列了，我们就当是初始状态，还未输入过滤条件）

项目上线后，谈一下感触比较深的一点：查询优化

　　　　实现比较简单，t_custmor_credit_record 左关联 t_custmor_deposit 就好，但是我们的额度记录表与需求列表有些许的出入，需要做一下简单的行转列。

　　需求实现

　　　　我们先来看看最初的sql查询，这可能是很多人最容易想到的

select min(tcd.channal) channal, max(tccr.id) mid,tccr.login_name,tccr.bill_no,tccr.create_time,
    if(credit_type=1,amount_before,0) as freebefore,
    if(credit_type=1,amount_change,0) as freechange,
    if(credit_type=1,amount_after,0) as freeafter,
    if(credit_type=2,amount_before,0) as freezebefore,
    if(credit_type=2,amount_change,0) as freezechange,
    if(credit_type=2,amount_after,0) as freezeafter,
    if(credit_type=3,amount_before,0) as promotionbefore,
    if(credit_type=3,amount_change,0) as promotionchange,
    if(credit_type=3,amount_after,0) as promotionafter
from t_customer_credit_record tccr
left join t_customer_deposit tcd on tccr.bill_no = tcd.bill_no
group by tccr.bill_no,tccr.login_name,tccr.create_time
order by mid desc
limit 0, 10;

　　　　数据量少的时候，也许能在我们接受的时间内查出我们需要的结果，一旦数据量多了，这个sql就跑不动了；我们先看下 60w 数据的情况下，我们只进行 t_custmor_credit_record 单表查询

select max(id) mid,login_name,bill_no,create_time,
    if(credit_type=1,amount_before,0) as freebefore,
    if(credit_type=1,amount_change,0) as freechange,
    if(credit_type=1,amount_after,0) as freeafter,
    if(credit_type=2,amount_before,0) as freezebefore,
    if(credit_type=2,amount_change,0) as freezechange,
    if(credit_type=2,amount_after,0) as freezeafter,
    if(credit_type=3,amount_before,0) as promotionbefore,
    if(credit_type=3,amount_change,0) as promotionchange,
    if(credit_type=3,amount_after,0) as promotionafter
from t_customer_credit_record
group by bill_no,login_name,create_time
order by mid desc
limit 0, 10;

　　　　效果如下

项目上线后，谈一下感触比较深的一点：查询优化

　　　　花了近 8 秒，这还只是单表，如果执行上面的联表sql，那时间又得增加不少（我试验的结果是直接卡住了，看不到查询结果）；

　　优化处理

　　　　加索引

　　　　　　查询慢的时候，我们最容易想到的优化方式往往就是加索引；上述sql执行的时候，t_custmor_credit_record 和 t_custmor_deposit都没有建索引（主键索引除外），那么我们就加索引呗。我的项目中加的是唯一索引，做了唯一约束，那我这里也加唯一索引

alter table t_customer_credit_record add unique uk_unique (bill_no,login_name,create_time,credit_type);
alter table t_customer_deposit add unique uk_billno (bill_no);

　　　　　　此时我们看下sql执行效果

项目上线后，谈一下感触比较深的一点：查询优化

　　　　　　我们发现，t_custmor_credit_record 单表查询的效率几乎没变，将近 8 秒，但 t_custmor_credit_record 与 t_custmor_deposit 联表的查询却在 11 秒内有结果了。加了索引为什么还这么慢了？难道没走索引？

项目上线后，谈一下感触比较深的一点：查询优化

　　　　　　我们是不是发现了什么？ if函数对联表查询是否走索引有影响，也对单表的查询速度有影响。上图中的 t_custmor_credit_record 单表查询，有if函数，查询时间近 8 秒，没有if函数，查询时间 2 秒左右；t_custmor_credit_record 与 t_custmor_deposit 联表查，有if函数，t_custmor_credit_record 走的是全表查，查询时间近 11 秒，没有if函数，t_custmor_credit_record 走的是索引，查询时间 3 秒不到。那么我们有没有什么办法拿掉这个if函数呢？

　　　　使用 case...when....then 代替 if

select max(id) mid,login_name,bill_no,create_time,
    case credit_type when 1 then amount_before else 0 end as freebefore,
    case credit_type when 1 then amount_change else 0 end as freechange,
    case credit_type when 1 then amount_after else 0 end as freeafter,
    case credit_type when 2 then amount_before else 0 end as freechange,
    case credit_type when 2 then amount_change else 0 end as freechange,
    case credit_type when 2 then amount_after else 0 end as freechange,
    case credit_type when 3 then amount_before else 0 end as promotionbefore,
    case credit_type when 3 then amount_change else 0 end as promotionchange,
    case credit_type when 3 then amount_after else 0 end as promotionafter
from t_customer_credit_record
group by bill_no,login_name,create_time
order by mid desc
limit 0, 10;

select min(tcd.channal) channal, max(tccr.id) mid,tccr.login_name,tccr.bill_no,tccr.create_time,
    case credit_type when 1 then amount_before else 0 end as freebefore,
    case credit_type when 1 then amount_change else 0 end as freechange,
    case credit_type when 1 then amount_after else 0 end as freeafter,
    case credit_type when 2 then amount_before else 0 end as freechange,
    case credit_type when 2 then amount_change else 0 end as freechange,
    case credit_type when 2 then amount_after else 0 end as freechange,
    case credit_type when 3 then amount_before else 0 end as promotionbefore,
    case credit_type when 3 then amount_change else 0 end as promotionchange,
    case credit_type when 3 then amount_after else 0 end as promotionafter
from t_customer_credit_record tccr
left join t_customer_deposit tcd on tccr.bill_no = tcd.bill_no
group by tccr.bill_no,tccr.login_name,tccr.create_time
order by mid desc
limit 0, 10;

项目上线后，谈一下感触比较深的一点：查询优化

　　　　　　我们可以看到，执行时间与 if 所差无几，执行计划也是与 if 的一致，这也就反映出不是 if的问题，应该是 group by 的问题。我们用 group by 结合 if（或 case...when....then），就是为了将 3 条额度记录合并成一条、行转列之后输出我们想要的结果，那有没有不用 group by、又能实现我们需求的方式了？

　　　　自联代替 group by

　　　　　　我们再仔细琢磨下这个需求，咋一看，确实需要行转列，那么就需要用到 group by，那么效率也就低了，这似乎是无解了？真的非要行转列吗，假设我们将额度记录拆分成 3 张表：一张表只存*资金的额度变化、一张表只存冻结资金的额度变化、一张表只存优惠的额度变化，这样是不是只需要联表查而不要用 group by 来进行行转列了？有小伙伴有可能会问：t_custmor_credit_record 表已经定了，数据都跑了不少了，再将其进行拆分，既要改表（同时还要迁移数据），还要改代码，工程量会很大！我们换个角度来看 t_custmor_credit_record ，目前它是 3 中额度记录的一个总和表，我们能不能从它的身上做文章，变化出我们想要的那 3 张表，然后进行联表查询呢？肯定可以的，类似如下

-- *资金额度记录表
select * from t_customer_credit_record where credit_type = 1;
-- 冻结资金额度记录表
select * from t_customer_credit_record where credit_type = 2;
-- 优惠额度记录表
select * from t_customer_credit_record where credit_type = 3;

　　　　　　接下来的 sql 怎么写，我想大家都知道了吧，自联就行了，写法有很多种，常见的写法有如下 4 种

-- 不用group by,做法1, 个人比较推荐， 但此种方式不支持存款表的过滤条件
select d.channal,a.amount_before as freebefore,a.amount_change as freechange, a.amount_after freeafter,
    b.amount_before as freezebefore,b.amount_change as freezechange, b.amount_after freezeafter,
    c.amount_before as promotionbefore,c.amount_change as promotionchange, c.amount_after promotionafter
from (
    select * from t_customer_credit_record where credit_type = 1 order by id desc limit 0, 10
) a
left join t_customer_credit_record b on a.bill_no = b.bill_no and b.credit_type = 2
left join t_customer_credit_record c on a.bill_no = c.bill_no and c.credit_type = 3
left join t_customer_deposit d on a.bill_no = d.bill_no;

-- 不用group by,做法2, 此种方式支持存款表的过滤条件
select a.channal,a.amount_before as freebefore,a.amount_change as freechange, a.amount_after freeafter,
    b.amount_before as freezebefore,b.amount_change as freezechange, b.amount_after freezeafter,
    c.amount_before as promotionbefore,c.amount_change as promotionchange, c.amount_after promotionafter
from (
    select r.*,d.channal from t_customer_credit_record r left join t_customer_deposit d on r.bill_no = d.bill_no
    where r.credit_type = 1 order by r.id desc limit 0, 10
) a
left join t_customer_credit_record b on a.bill_no = b.bill_no and b.credit_type = 2
left join t_customer_credit_record c on a.bill_no = c.bill_no and c.credit_type = 3;

-- 不用group by,做法3, 这是最容易想到的方法
select d.channal,a.amount_before as freebefore,a.amount_change as freechange, a.amount_after freeafter,
    b.amount_before as freezebefore,b.amount_change as freezechange, b.amount_after freezeafter,
    c.amount_before as promotionbefore,c.amount_change as promotionchange, c.amount_after promotionafter
from t_customer_credit_record a
left join t_customer_credit_record b on a.bill_no = b.bill_no
left join t_customer_credit_record c on a.bill_no = c.bill_no
left join t_customer_deposit d on a.bill_no = d.bill_no
where a.credit_type = 1 and b.credit_type = 2 and c.credit_type = 3 
order by a.id desc limit 0, 10;

-- 不用group by,做法4
select d.channal,a.amount_before as freebefore,a.amount_change as freechange, a.amount_after freeafter,
    b.amount_before as freezebefore,b.amount_change as freezechange, b.amount_after freezeafter,
    c.amount_before as promotionbefore,c.amount_change as promotionchange, c.amount_after promotionafter
from t_customer_credit_record a
left join t_customer_credit_record b on a.bill_no = b.bill_no and b.credit_type = 2
left join t_customer_credit_record c on a.bill_no = c.bill_no and c.credit_type = 3
left join t_customer_deposit d on a.bill_no = d.bill_no
where a.credit_type = 1 
order by a.id desc limit 0, 10;

　　　　　　执行结果如下

项目上线后，谈一下感触比较深的一点：查询优化

　　　　　　就目前的数据量而言，4 种写法的效率一样，但是数据量再往上走，它们之前还是有性能差别的，大家可以仔细看看这 4 个 sql 的执行计划，它们之间还是有区别的。最终我的项目中采用的是第一种写法

　　　　表重新设计

　　　　　　我们回过头去看看 t_customer_credit 和 t_custmor_credit_record，是否真的有必要用 3 条记录来存放顾客的 3 种额度，一条记录将用户的 3 种额度都记录下来不是更好吗？如下所示

-- 自认为更好的表设计
drop table if exists t_customer_credit_plus;
create table t_customer_credit_plus (
  id int(11) unsigned not null auto_increment comment '自增主键',
  login_name varchar(50) not null comment '登录名',
    free_amount decimal(22,6) not null default '0.00000' comment '*资金额度',
    freeze_amount decimal(22,6) not null default '0.00000' comment '冻结资金额度',
    promotion_amount decimal(22,6) not null default '0.00000' comment '优惠资金额度',
    create_by varchar(50) not null comment '创建者',
    create_time datetime not null default current_timestamp comment '创建时间',
    update_time datetime not null default current_timestamp on update current_timestamp comment '创建时间',
    update_by varchar(50) not null comment '修改者',
  primary key (id),
  unique key `uk_login_name` (`login_name`)
);
drop table if exists t_customer_credit_record_plus;
create table t_customer_credit_record_plus (
  id int(11) unsigned not null auto_increment comment '自增主键',
  login_name varchar(50) not null comment '登录名',
    bill_no varchar(50) not null comment '订单号',
    free_amount_before decimal(22,6) not null default '0.00000' comment '*资金前额度值',
    free_amount_change decimal(22,6) not null default '0.00000' comment '*资金前额度变化值',
    free_amount_after decimal(22,6) not null default '0.00000' comment '*资金前后额度值',
    freeze_amount_before decimal(22,6) not null default '0.00000' comment '冻结资金前额度值',
    freeze_amount_change decimal(22,6) not null default '0.00000' comment '冻结资金额度变化值',
    freeze_amount_after decimal(22,6) not null default '0.00000' comment '冻结资金后额度值',
    promotion_amount_before decimal(22,6) not null default '0.00000' comment '优惠前额度值',
    promotion_amount_change decimal(22,6) not null default '0.00000' comment '优惠额度变化值',
    promotion_amount_after decimal(22,6) not null default '0.00000' comment '优惠后额度值',
    create_by varchar(50) not null comment '创建者',
    create_time datetime not null default current_timestamp comment '创建时间',
    remark varchar(500) not null default '' comment '备注',
  primary key (id),
  unique key `uk_unique` (`bill_no`,`login_name`,`create_time`)
);

　　　　　　我想很多人都会有相同的感觉吧，但是从拓展性出发，分 3 条记录的做法更好，为什么呢？如果后续新增 1 种或多种另外的额度类型，上述的 1 条记录的表设计就需要新增字段来适配了，但是 3 条记录的做法，只需要拓展credit_type的值就好了，表无需改动。各有利弊，如何选择，需要团队协商之后做出最好的选择。

　　　　　　最终项目中采用的还是 3 条记录存放 3 个额度的方式，没有采用我说的；原因是：大家都认为效率影响不大，也容易理解，关键是拓展性很好，后续很方便就能加入新的额度类型。

　　　　业务上的协调

　　　　　　最后我们再回到需求上来，这个存款渠道真的有必要显示在额度记录吗？

　　　　　　1、对公司来说，存款记录越多，那肯定是越好，但我们从实际出发，存款记录在额度记录中占的比例大吗，这个相信大家也都能想象得到，比例非常低，可能 100 条记录中会有 1 条；

　　　　　　2、本来就有单独的存款页面展示顾客的存款，去专门的存款记录页面看岂不是更直观？

　　　　　　最后和产品讨论，还真把这一列给拿掉了，那么我们也就不需要关联存款表来查了，sql 更简单，效率也更高了！

总结

　　1、sql 行转列，往往是 group by 配合聚合函数（sum、max、min等）来实现，当然也包括 if 和 case...when....then；

　　2、索引是提高查询效率的最有效的、也是最常用的方式，我们对查询的优化都要往索引上靠，explain 可以查看sql的执行计划，我们可以从中获取sql优化的提示；

　　3、一定要结合业务来写出高效的sql

　　　　可能很多小伙伴会有这样的不满：上述的 3 个额度的例子有点特殊，不具备通用性，上述高效的sql也只是在你（楼主）的项目中有效。你说的对，但是我们要知道，技术本身就是用来服务业务的，脱离了业务，技术有什么实际意义？但是我们回过头去细看，我举的例子真的就特殊到独一无二？我想还算比较通用吧，还是能套用很多场景的。

　　4、要敢于质疑需求

　　　　虽然大多数时候产品都考虑比较周到，但也不能完全保证他没有不犯迷糊的时候吧。有疑惑就向产品问清楚，我们实现的也更快、更准确。一定不要对疑问藏着掖着，以我亲身经历来讲，很多时候开发认为的都是对的，如果藏着掖着，那你就准备返工改成你之前认为的那样吧！

上一篇：用 Hystrix 构建高可用服务架构

下一篇： JavaEE高级-Spring Data学习笔记