count distinct优化
系统要进行压力测试,开启漫日志查询后。 [root@ora11g mysql]# less ora11g-slow.log /usr/sbin/mysqld, Version: 5.6.12 (MySQL Community Server (GPL)). started with: Tcp port: 3306 Unix socket: /var/lib/mysql/mysql.sock Time Id Command Argument
系统要进行压力测试,开启漫日志查询后。
[root@ora11g mysql]# less ora11g-slow.log
/usr/sbin/mysqld, Version: 5.6.12 (MySQL Community Server (GPL)). started with:
Tcp port: 3306 Unix socket: /var/lib/mysql/mysql.sock
Time Id Command Argument
# Time: 140508 12:15:52
# User@Host: root[root] @ [10.8.8.64] Id: 86
# Query_time: 124.894071 Lock_time: 0.000228 Rows_sent: 1 Rows_examined: 510103
use decathlon_production;
SET timestamp=1399522552;
select count(distinct customer0_.CUSTOMER_NO) as x0_0_ from TBL_CUSTOMER customer0_ where (customer0_.CUSTOMER_TYPE='200164' );
# Time: 140508 12:17:33
# User@Host: root[root] @ [10.8.8.64] Id: 91
# Query_time: 144.808880 Lock_time: 0.000330 Rows_sent: 1 Rows_examined: 510103
SET timestamp=1399522653;
select count(distinct customer0_.CUSTOMER_NO) as x0_0_ from TBL_CUSTOMER customer0_ where (customer0_.CUSTOMER_TYPE='200164' );
检索下列sql语句消耗了较多的性能,这个表是innodb存储引擎。
mysql> explain select count(distinct customer0_.CUSTOMER_NO) as x0_0_ from TBL_CUSTOMER customer0_ where (customer0_.CUSTOMER_TYPE='200164' );
+----+-------------+------------+------+---------------------------+-------------------+---------+-------+--------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+------+---------------------------+-------------------+---------+-------+--------+--------------------------+
| 1 | SIMPLE | customer0_ | ref | PRIMARY,ind_CUSTOMER_TYPE | ind_CUSTOMER_TYPE | 767 | const | 258611 | Using where; Using index |
+----+-------------+------------+------+---------------------------+-------------------+---------+-------+--------+--------------------------+
1 row in set (0.00 sec)
而我们看下下面的几个同等含义的sql语句:
mysql> explain select count(*) as x0_0_ from TBL_CUSTOMER customer0_ where (customer0_.CUSTOMER_TYPE='200164' );
+----+-------------+------------+------+-------------------+-------------------+---------+-------+--------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+------+-------------------+-------------------+---------+-------+--------+--------------------------+
| 1 | SIMPLE | customer0_ | ref | ind_CUSTOMER_TYPE | ind_CUSTOMER_TYPE | 767 | const | 258611 | Using where; Using index |
+----+-------------+------------+------+-------------------+-------------------+---------+-------+--------+--------------------------+
1 row in set (0.00 sec)
mysql> explain select count(customer0_.CUSTOMER_NO) as x0_0_ from TBL_CUSTOMER customer0_ where (customer0_.CUSTOMER_TYPE='200164' );
+----+-------------+------------+------+-------------------+-------------------+---------+-------+--------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+------+-------------------+-------------------+---------+-------+--------+--------------------------+
| 1 | SIMPLE | customer0_ | ref | ind_CUSTOMER_TYPE | ind_CUSTOMER_TYPE | 767 | const | 258611 | Using where; Using index |
+----+-------------+------------+------+-------------------+-------------------+---------+-------+--------+--------------------------+
1 row in set (0.00 sec)
分别执行下看看执行时间:
mysql> select count(distinct customer0_.CUSTOMER_NO) as x0_0_ from TBL_CUSTOMER customer0_ where (customer0_.CUSTOMER_TYPE='200164' );
+--------+
| x0_0_ |
+--------+
| 510069 |
+--------+
1 row in set (45.33 sec)
mysql> select count(*) as x0_0_ from TBL_CUSTOMER customer0_ where (customer0_.CUSTOMER_TYPE='200164' );
+--------+
| x0_0_ |
+--------+
| 510069 |
+--------+
1 row in set (0.70 sec)
mysql> select count(customer0_.CUSTOMER_NO) as x0_0_ from TBL_CUSTOMER customer0_ where (customer0_.CUSTOMER_TYPE='200164' );
+--------+
| x0_0_ |
+--------+
| 510069 |
+--------+
1 row in set (0.74 sec)
上面三个sql虽然执行计划一样ref,都是直接走customer_type字段的索引ind_CUSTOMER_TYPE,并且extra Using where; Using index的using index告诉我们这里优化器只读取了索引,并没有通过索引来回表(mysql没有oracle的索引快速扫描),但是执行时间却完全不是一个数量级的,加上distinct的sql语句足足要1分多钟。
开启profile来查看下其资源消耗的具体信息:
mysql> set profiling=1;
Query OK, 0 rows affected, 1 warning (0.00 sec)
mysql> select count(distinct customer0_.CUSTOMER_NO) as x0_0_ from TBL_CUSTOMER customer0_ where (customer0_.CUSTOMER_TYPE='200164' );
+--------+
| x0_0_ |
+--------+
| 510069 |
+--------+
1 row in set (1 min 3.71 sec)
mysql> select count(customer0_.CUSTOMER_NO) as x0_0_ from TBL_CUSTOMER customer0_ where (customer0_.CUSTOMER_TYPE='200164' );
+--------+
| x0_0_ |
+--------+
| 510069 |
+--------+
1 row in set (0.73 sec)
mysql> show profiles;
+----------+-------------+--------------------------------------------------------------------------------------------------------------------------------+
| Query_ID | Duration | Query |
+----------+-------------+--------------------------------------------------------------------------------------------------------------------------------+
| 1 | 43.56143300 | select count(distinct customer0_.CUSTOMER_NO) as x0_0_ from TBL_CUSTOMER customer0_ where (customer0_.CUSTOMER_TYPE='200164' ) |
| 2 | 0.68889800 | select count(customer0_.CUSTOMER_NO) as x0_0_ from TBL_CUSTOMER customer0_ where (customer0_.CUSTOMER_TYPE='200164' ) |
+----------+-------------+--------------------------------------------------------------------------------------------------------------------------------+
2 rows in set, 1 warning (0.00 sec)
这里的区别主要还是sending data这一步骤时间的差别,这个差别主要是cpu_user、cpu_system、context_voluntary和coutext_involuntary,加上distinct后优化器需要对其进行排序去重后才会去计算count,这个排序去重是很消耗cpu资源的,所以这里的sending data的cpu user和cpu system差别比较大。
mysql的explain不考虑各种cache和mysql在执行查询时所作的优化工作,这里mysql并没有显示排序去重的执行计划,而由于customer_no是主键,不需要进行distinct去重,所以这里直接改写应用程序的sql即可,如果不是主键可以拆分成子查询的方式来进行优化
mysql> explain select count(*) as x0_0_ from (select distinct CUSTOMER_NO from TBL_CUSTOMER customer0_ where (customer0_.CUSTOMER_TYPE='200164' ))a;
+----+-------------+------------+------+---------------------------+-------------------+---------+-------+--------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+------+---------------------------+-------------------+---------+-------+--------+--------------------------+
| 1 | PRIMARY |
| 2 | DERIVED | customer0_ | ref | PRIMARY,ind_CUSTOMER_TYPE | ind_CUSTOMER_TYPE | 767 | const | 258611 | Using where; Using index |
+----+-------------+------------+------+---------------------------+-------------------+---------+-------+--------+--------------------------+
2 rows in set (0.00 sec)
mysql> select count(*) as x0_0_ from (select distinct CUSTOMER_NO from TBL_CUSTOMER customer0_ where (customer0_.CUSTOMER_TYPE='200164' ))a;
+--------+
| x0_0_ |
+--------+
| 510069 |
+--------+
1 row in set (1.40 sec)
原文地址:count distinct优化, 感谢原作者分享。