欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页  >  IT编程

扩展group by语句

程序员文章站 2022-06-25 08:43:04
学习自《剑破冰山 Oracle开发艺术》第五章 报表开发之扩展GROUP BY对于简单group by语句很难对复杂维度进行分析,难以达到实际生产的复杂报表需求,group by的扩展特性就需要了,union语句也可以达到需求但是sql复杂且效率低1 rollup多维汇总rollup,分组先进行常规... ......

学习自《剑破冰山 oracle开发艺术》第五章 报表开发之扩展group by


对于简单group by语句很难对复杂维度进行分析,难以达到实际生产的复杂报表需求,group by的扩展特性就需要了,union语句也可以达到需求但是sql复杂且效率低

1 rollup多维汇总

rollup,分组先进行常规分组,然后在此基础上,通过将列从右向左移动,然后进行更高一级的小计,最后合计,注意rollup分组和列的顺序相关

指定n列,有n+1种分组方式

部分rollup可以剔除某些不需要的小计和合计

例子

[oracle@localhost ~]$ sqlplus scott/tiger;

sql*plus: release 11.2.0.4.0 production on mon mar 23 10:31:24 2020

copyright (c) 1982, 2013, oracle.  all rights reserved.


connected to:
oracle database 11g enterprise edition release 11.2.0.4.0 - 64bit production
with the partitioning, olap, data mining and real application testing options

10:31:24 scott@edw> set autotrace on
10:31:30 scott@edw> select a.dname,b.job,sum(b.sal) sum_sal from dept a,emp b where a.deptno=b.deptno group by rollup(a.dname,b.job);

dname          job          sum_sal
-------------- --------- ----------
sales          clerk            950
sales          manager         2850
sales          salesman        5600
sales                          9400
research       clerk           1900
research       analyst         6000
research       manager         2975
research                      10875
accounting     clerk           1300
accounting     manager         2450
accounting     president       5000
accounting                     8750
                              29025

13 rows selected.

elapsed: 00:00:00.01

execution plan
----------------------------------------------------------
plan hash value: 3067950682

-----------------------------------------------------------------------------------------
| id  | operation                     | name    | rows  | bytes | cost (%cpu)| time     |
-----------------------------------------------------------------------------------------
|   0 | select statement              |         |    14 |   392 |     7  (29)| 00:00:01 |
|   1 |  sort group by rollup         |         |    14 |   392 |     7  (29)| 00:00:01 |
|   2 |   merge join                  |         |    14 |   392 |     6  (17)| 00:00:01 |
|   3 |    table access by index rowid| dept    |     4 |    52 |     2   (0)| 00:00:01 |
|   4 |     index full scan           | pk_dept |     4 |       |     1   (0)| 00:00:01 |
|*  5 |    sort join                  |         |    14 |   210 |     4  (25)| 00:00:01 |
|   6 |     table access full         | emp     |    14 |   210 |     3   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

predicate information (identified by operation id):
---------------------------------------------------

   5 - access("a"."deptno"="b"."deptno")
       filter("a"."deptno"="b"."deptno")


statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          8  consistent gets
          0  physical reads
          0  redo size
        913  bytes sent via sql*net to client
        524  bytes received via sql*net from client
          2  sql*net roundtrips to/from client
          2  sorts (memory)
          0  sorts (disk)
         13  rows processed

10:31:34 scott@edw> 

可以看出仅仅dept和emp表均仅扫描一次,而如果是union来写就会多次重复扫描,效率低

通过执行计划看到有个隐藏操作sort group by rollup ,显示结果有序,一般还是要显示排序的,默认的排序不一定符合业务需求

rollup分组具有方向性

如果使用hint:expand_gset_to_union,则优化器会将rollup转换为对应的union all操作,其他的grouping sets、cube也可以


部分rollup分组,将不需要小计的列从rollup拿出到group by中即可,当然合计也没有了

例子

10:31:34 scott@edw> set autotrace off
10:43:49 scott@edw> select to_char(b.hiredate,'yyyy') hire_year,a.dname,b.job,sum(b.sal) sum_sal from dept a,emp b where a.deptno=b.deptno group by to_char(b.hiredate,'yyyy'),a.dname,rollup(b.job);

hire dname          job          sum_sal
---- -------------- --------- ----------
1980 research       clerk            800
1980 research                        800
1981 sales          clerk            950
1981 sales          manager         2850
1981 sales          salesman        5600
1981 sales                          9400
1981 research       analyst         3000
1981 research       manager         2975
1981 research                       5975
1981 accounting     manager         2450
1981 accounting     president       5000
1981 accounting                     7450
1982 accounting     clerk           1300
1982 accounting                     1300
1987 research       clerk           1100
1987 research       analyst         3000
1987 research                       4100

17 rows selected.

elapsed: 00:00:00.01
10:43:53 scott@edw> 

2 cube交叉报表

cube分组可以实现更精细复杂的统计,对不同维度的所以可能进行分析,生成交叉报表,cube分组,是从n列中先进行合计,即一个列不取,然后小计,即取1列到n-1列,最后n列全取,即标准分组

因为包含所有可能的组合,所以结果与列的顺序无关,列顺序仅仅影响默认的隐藏排序而已,如果用了显示排序则无所谓了

cube分组增加一列,可能结果是指数级的增长,分组种类2的n次方

语法类似,例子

11:02:40 scott@edw> set autotrace on
11:02:48 scott@edw>  select a.dname,b.job,sum(b.sal) sum_sal from dept a,emp b where a.deptno=b.deptno group by cube(a.dname,b.job);

dname          job          sum_sal
-------------- --------- ----------
                              29025
               clerk           4150
               analyst         6000
               manager         8275
               salesman        5600
               president       5000
sales                          9400
sales          clerk            950
sales          manager         2850
sales          salesman        5600
research                      10875
research       clerk           1900
research       analyst         6000
research       manager         2975
accounting                     8750
accounting     clerk           1300
accounting     manager         2450
accounting     president       5000

18 rows selected.

elapsed: 00:00:00.01

execution plan
----------------------------------------------------------
plan hash value: 2382666110

-------------------------------------------------------------------------------------------
| id  | operation                       | name    | rows  | bytes | cost (%cpu)| time     |
-------------------------------------------------------------------------------------------
|   0 | select statement                |         |    14 |   392 |     7  (29)| 00:00:01 |
|   1 |  sort group by                  |         |    14 |   392 |     7  (29)| 00:00:01 |
|   2 |   generate cube                 |         |    14 |   392 |     7  (29)| 00:00:01 |
|   3 |    sort group by                |         |    14 |   392 |     7  (29)| 00:00:01 |
|   4 |     merge join                  |         |    14 |   392 |     6  (17)| 00:00:01 |
|   5 |      table access by index rowid| dept    |     4 |    52 |     2   (0)| 00:00:01 |
|   6 |       index full scan           | pk_dept |     4 |       |     1   (0)| 00:00:01 |
|*  7 |      sort join                  |         |    14 |   210 |     4  (25)| 00:00:01 |
|   8 |       table access full         | emp     |    14 |   210 |     3   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

predicate information (identified by operation id):
---------------------------------------------------

   7 - access("a"."deptno"="b"."deptno")
       filter("a"."deptno"="b"."deptno")


statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          8  consistent gets
          0  physical reads
          0  redo size
       1175  bytes sent via sql*net to client
        535  bytes received via sql*net from client
          3  sql*net roundtrips to/from client
          3  sorts (memory)
          0  sorts (disk)
         18  rows processed

11:02:52 scott@edw> 

可以看执行计划,结果也是有序的


部分cube分组,例子

11:06:24 scott@edw>  select a.dname,b.job,sum(b.sal) sum_sal from dept a,emp b where a.deptno=b.deptno group by a.dname,cube(b.job);

dname          job          sum_sal
-------------- --------- ----------
sales                          9400
sales          clerk            950
sales          manager         2850
sales          salesman        5600
research                      10875
research       clerk           1900
research       analyst         6000
research       manager         2975
accounting                     8750
accounting     clerk           1300
accounting     manager         2450
accounting     president       5000

12 rows selected.

elapsed: 00:00:00.00
11:06:26 scott@edw>

3 grouping sets实现小计

rollup和cube会产生各种标准分组、小计、合计,grouping  sets则只关注指定维度的小计,n列的结果也是n种

如grouping sets(a,b,c)就是group by a、group by b和group by c的结果union all

例子

11:06:26 scott@edw>  set autotrace on
11:12:33 scott@edw> select to_char(b.hiredate,'yyyy') hire_year,a.dname,b.job,sum(b.sal) sum_sal from dept a,emp b where a.deptno=b.deptno group by grouping sets( to_char(b.hiredate,'yyyy'),a.dname,b.job);

hire dname          job          sum_sal
---- -------------- --------- ----------
                    clerk           4150
                    salesman        5600
                    president       5000
                    manager         8275
                    analyst         6000
     accounting                     8750
     research                      10875
     sales                          9400
1987                                4100
1980                                 800
1982                                1300
1981                               22825

12 rows selected.

elapsed: 00:00:00.01

execution plan
----------------------------------------------------------
plan hash value: 2825031421

------------------------------------------------------------------------------------------------------------
| id  | operation                      | name                      | rows  | bytes | cost (%cpu)| time     |
------------------------------------------------------------------------------------------------------------
|   0 | select statement               |                           |    14 |   448 |    17  (24)| 00:00:01 |
|   1 |  temp table transformation     |                           |       |       |            |          |
|   2 |   load as select               | sys_temp_0fd9d660d_29b9bb |       |       |            |          |
|   3 |    merge join                  |                           |    14 |   504 |     6  (17)| 00:00:01 |
|   4 |     table access by index rowid| dept                      |     4 |    52 |     2   (0)| 00:00:01 |
|   5 |      index full scan           | pk_dept                   |     4 |       |     1   (0)| 00:00:01 |
|*  6 |     sort join                  |                           |    14 |   322 |     4  (25)| 00:00:01 |
|   7 |      table access full         | emp                       |    14 |   322 |     3   (0)| 00:00:01 |
|   8 |   load as select               | sys_temp_0fd9d660e_29b9bb |       |       |            |          |
|   9 |    hash group by               |                           |     5 |    60 |     3  (34)| 00:00:01 |
|  10 |     table access full          | sys_temp_0fd9d660d_29b9bb |    14 |   168 |     2   (0)| 00:00:01 |
|  11 |   load as select               | sys_temp_0fd9d660e_29b9bb |       |       |            |          |
|  12 |    hash group by               |                           |     4 |    56 |     3  (34)| 00:00:01 |
|  13 |     table access full          | sys_temp_0fd9d660d_29b9bb |    14 |   196 |     2   (0)| 00:00:01 |
|  14 |   load as select               | sys_temp_0fd9d660e_29b9bb |       |       |            |          |
|  15 |    hash group by               |                           |     1 |     8 |     3  (34)| 00:00:01 |
|  16 |     table access full          | sys_temp_0fd9d660d_29b9bb |    14 |   112 |     2   (0)| 00:00:01 |
|  17 |   view                         |                           |     5 |   160 |     2   (0)| 00:00:01 |
|  18 |    table access full           | sys_temp_0fd9d660e_29b9bb |     5 |    60 |     2   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------------

predicate information (identified by operation id):
---------------------------------------------------

   6 - access("sys_tbl_$2$"."deptno"="sys_tbl_$1$"."deptno")
       filter("sys_tbl_$2$"."deptno"="sys_tbl_$1$"."deptno")


statistics
----------------------------------------------------------
         23  recursive calls
         33  db block gets
         39  consistent gets
          4  physical reads
       2172  redo size
        962  bytes sent via sql*net to client
        524  bytes received via sql*net from client
          2  sql*net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
         12  rows processed

11:12:36 scott@edw> 

执行计划可以看出,没有默认排序了,无序,和列的顺序也无关


同理部分grouping sets分组,例子

11:12:36 scott@edw> set autotrace off
11:17:03 scott@edw> select a.dname,to_char(b.hiredate,'yyyy') hire_year,b.job,sum(b.sal) sum_sal from dept a,emp b where a.deptno=b.deptno group by a.dname,grouping sets(to_char(b.hiredate,'yyyy'),b.job);

dname          hire job          sum_sal
-------------- ---- --------- ----------
sales               manager         2850
sales               clerk            950
accounting          manager         2450
accounting          president       5000
accounting          clerk           1300
research            manager         2975
sales               salesman        5600
research            analyst         6000
research            clerk           1900
research       1981                 5975
sales          1981                 9400
research       1987                 4100
accounting     1981                 7450
accounting     1982                 1300
research       1980                  800

15 rows selected.

elapsed: 00:00:00.01
11:17:05 scott@edw> 

注意此时的含义有较大的变化

cube、rollup作为grouping sets的参数

grouping sets只提供单列分组,没有合计功能,如果需要提供合计,则可以将rollup或cube作为参数,例子


11:23:59 scott@edw>  select a.dname,b.job,sum(b.sal) sum_sal from dept a,emp b where a.deptno=b.deptno group by grouping sets(rollup(a.dname),rollup(b.job));

dname          job          sum_sal
-------------- --------- ----------
               clerk           4150
               salesman        5600
               president       5000
               manager         8275
               analyst         6000
accounting                     8750
research                      10875
sales                          9400
                              29025
                              29025

10 rows selected.

elapsed: 00:00:00.02
11:24:02 scott@edw> 

问题是产生了两个合计行,因为rollup或cube作为grouping sets参数,相当于每个rollup或cube操作的union all,等价于扩展group by语句这就很好理解功能了

对于重复合计,使用distinct剔除即可,另外后面还有特殊的函数可以使用,group_id可以用来剔除重复分组(和distinct功能是不一样的)

rollup和cube作为参数也可以混用,而且也可以使用其它扩展功能,如部分分组、复合列分组、连接分组等

rollup和cube不能接受grouping sets作为参数,rollup和cube互相作为参数也不行

4 组合列分组、连接分组、重置列分组

组合列分组、连接分组在复杂报表中用处很大。组合列分组用于剔除不必要的小计保留合计,连接分组按每个分组的笛卡尔积进行操作,分组更多更细。对于常规分组满足不了的需求可以考虑

组合列即将多个列当做整体对待,下列对比表可以清晰展示不同之处

扩展group by语句连接分组更强大,允许group by后出现多个rollup、cube和grouping sets操作,这样分组级别更多,报表更精细,实现很复杂的需求扩展group by语句实际上不管是同类型的连接分组还是不通类型的连接分组之间,最后的分组级别种类都是每个扩展分组级别种类的乘积,分组级别是笛卡尔积,比如rollup(a,b),rollup(c),最终3*2=6中分组级别


重复列分组也就是group by中允许重复列,比如group by rollup(a,(a,b))、group by a,rollup(a,b)

组合列分组

例子

14:48:13 scott@edw> select a.dname,to_char(b.hiredate,'yyyy') hire_year,b.job,sum(b.sal) sum_sal from dept a,emp b where a.deptno=b.deptno group by rollup(a.dname,(to_char(b.hiredate,'yyyy'),b.job));

dname          hire job          sum_sal
-------------- ---- --------- ----------
sales          1981 clerk            950
sales          1981 manager         2850
sales          1981 salesman        5600
sales                               9400
research       1980 clerk            800
research       1981 analyst         3000
research       1981 manager         2975
research       1987 clerk           1100
research       1987 analyst         3000
research                           10875
accounting     1981 manager         2450
accounting     1981 president       5000
accounting     1982 clerk           1300
accounting                          8750
                                   29025

15 rows selected.

elapsed: 00:00:00.00
14:48:16 scott@edw> 

组合列分组可以实现部分rollup和部分cube分组类似效果并且加上合计

但是这个也比较麻烦,对于需要cube、rollup合计并剔除部分小计的需求用grouping_id或grouping函数

cube和rollup均可以转换为对应的grouping sets

当然反向也可以,不过意义不大

连接分组

例子

14:48:16 scott@edw>  select a.dname,to_char(b.hiredate,'yyyy') hire_year,b.job,sum(b.sal) sum_sal from dept a,emp b where a.deptno=b.deptno group by rollup(a.dname,b.job),rollup(to_char(b.hiredate,'yyyy'));

dname          hire job          sum_sal
-------------- ---- --------- ----------
sales               clerk            950
sales               manager         2850
sales               salesman        5600
sales                               9400
research            clerk           1900
research            analyst         6000
research            manager         2975
research                           10875
accounting          clerk           1300
accounting          manager         2450
accounting          president       5000
accounting                          8750
                                   29025
research       1980 clerk            800
research       1980                  800
               1980                  800
sales          1981 clerk            950
sales          1981 manager         2850
sales          1981 salesman        5600
sales          1981                 9400
research       1981 analyst         3000
research       1981 manager         2975
research       1981                 5975
accounting     1981 manager         2450
accounting     1981 president       5000
accounting     1981                 7450
               1981                22825
accounting     1982 clerk           1300
accounting     1982                 1300
               1982                 1300
research       1987 clerk           1100
research       1987 analyst         3000
research       1987                 4100
               1987                 4100

34 rows selected.

elapsed: 00:00:00.01
14:57:57 scott@edw> 

相当于两个rollup的笛卡尔积

理解了之后,利用连接分组,cube可以用rollup转换,如cube(a,b,c)等于rollup(a),rollup(b),rollup(c),但是对于rollup和grouping sets转换为cube一般没啥用

连接分组一般是同类型的,不通类型的连接分组一般不常用

重复列分组

例子

14:57:57 scott@edw>   select a.dname,b.job,sum(b.sal) sum_sal from dept a,emp b where a.deptno=b.deptno group by a.dname,rollup(a.dname,b.job);

dname          job          sum_sal
-------------- --------- ----------
sales          clerk            950
sales          manager         2850
sales          salesman        5600
research       clerk           1900
research       analyst         6000
research       manager         2975
accounting     clerk           1300
accounting     manager         2450
accounting     president       5000
sales                          9400
research                      10875
accounting                     8750
sales                          9400
research                      10875
accounting                     8750

15 rows selected.

elapsed: 00:00:00.00
15:07:14 scott@edw> 

没啥意义的例子,只不过说明语法允许

5 三个扩展分组函数:grouping、grouping_id、group_id

三个扩展分组函数:grouping、grouping_id、group_id在生成有意义的报表、结果进行过滤、排序中有很重要的作用,常用于复杂的报表查询

注意grouping和grouping_id函数的参数不能是组合列

grouping函数用于制作有意义的报表

grouping_id函数对结果过滤以及排序

group_id函数剔除重复行

grouping函数

在扩展group by子句来说,null表示小计或者合计,但是如果数据中本来就有null值呢?grouping函数专门处理扩展group by分组中null问题:

    它只接受一个参数,且参数来自rollup、cube、grouping sets中的列。当然也可以在group by而不在上述3个子句的列,不过结果肯定是0,没有意义

    grouping函数对于小计或合计的列返回1,否则返回0。用于区别是否原始数据中含null,常与decode一起使用。当然也可以确定分组级别从而过滤一些行,不过会很烦,一般用grouping_id替代

例子

15:34:01 scott@edw>  select decode(grouping(a.dname),1,'全部部门',a.dname) dname,decode(grouping(b.mgr),1,'全部老板',b.mgr) mgr,sum(b.sal) sum_sal from dept a,emp b where a.deptno=b.deptno group by rollup(a.dname,b.mgr);

dname          mgr                                         sum_sal
-------------- ---------------------------------------- ----------
sales          7698                                           6550
sales          7839                                           2850
sales          全部老板                                       9400
research       7566                                           6000
research       7788                                           1100
research       7839                                           2975
research       7902                                            800
research       全部老板                                      10875
accounting                                                    5000
accounting     7782                                           1300
accounting     7839                                           2450
accounting     全部老板                                       8750
全部部门       全部老板                                      29025

13 rows selected.

elapsed: 00:00:00.01
15:34:12 scott@edw> 

grouping_id函数

用于过滤分组级别和排序结果

可以接受多个参数,来自rollup、cube、grouping sets中的列,按列从左往右顺序计算,是分组列则0,是小计或合计列为1,然后组合成为一个二进制数字叫做位向量,位向量转化为10进制即最后的结果,代表分组级别,如cube(a,b),那么grouping_id(a,b)代表的如下

扩展group by语句grouping_id的好处是可以对多列进行计算得到分组级别

例子

15:46:26 scott@edw>  select a.dname,b.mgr,b.job,sum(b.sal) sum_sal from dept a,emp b where a.deptno=b.deptno group by rollup(a.dname,b.mgr,b.job) having grouping_id(a.dname,b.mgr,b.job) in (0,7);

dname                 mgr job          sum_sal
-------------- ---------- --------- ----------
sales                7698 clerk            950
sales                7698 salesman        5600
sales                7839 manager         2850
research             7566 analyst         6000
research             7788 clerk           1100
research             7839 manager         2975
research             7902 clerk            800
accounting                president       5000
accounting           7782 clerk           1300
accounting           7839 manager         2450
                                         29025

11 rows selected.

elapsed: 00:00:00.00
15:46:29 scott@edw> 

group_id函数

group_id无参数,因为扩展group by子句允许多种复杂分组操作,有时候为了实现复杂报表,可能出现重复统计,而group_id函数可以区分重复分组结果,第一次出现为0,以后每次出现增1,group_id在select中出现没啥意义,通常用于having子句剔除重复统计

例子

15:46:29 scott@edw>  select a.dname,b.job,sum(b.sal) sum_sal,group_id() gi from dept a,emp b where a.deptno=b.deptno group by grouping sets(rollup(a.dname),rollup(b.job)) having group_id()=0;

dname          job          sum_sal         gi
-------------- --------- ---------- ----------
               clerk           4150          0
               salesman        5600          0
               president       5000          0
               manager         8275          0
               analyst         6000          0
accounting                     8750          0
research                      10875          0
sales                          9400          0
                              29025          0

9 rows selected.

elapsed: 00:00:00.01
15:55:55 scott@edw>