(Les20 Analytics Functions)[20180108]
程序员文章站
2022-03-29 13:41:06
学习目标 -了解分析函数作用和类型 -使用分析函数产生报告 分析函数 分析函数用于计算一些基于组的聚合值,它与聚合函数的区别在于,分析函数每组返回多行,聚合函数每组返回一行。 一般分析函数 ROW_NUMBER() OVER(PARTITION BY ... ORDER BY ...) 按分区或返回 ......
学习目标
-了解分析函数作用和类型
-使用分析函数产生报告
分析函数
分析函数用于计算一些基于组的聚合值,它与聚合函数的区别在于,分析函数每组返回多行,聚合函数每组返回一行。
一般分析函数
ROW_NUMBER() OVER(PARTITION BY ... ORDER BY ...) 按分区或返回记录生成唯一编号
RANK() OVER(PARTITION BY ... ORDER BY ...) 按分区或返回记录排序,会跳号
DENSE_RANK() OVER(PARTITION BY ... ORDER BY ...) 按分区或返回记录排序,不跳号
COUNT() OVER(PARTITION BY ... ORDER BY ...) 按分区或返回记录进行计数
MAX() OVER(PARTITION BY ... ORDER BY ...) 按分区或返回记录计算最大值
MIN() OVER(PARTITION BY ... ORDER BY ...) 按分区或返回记录计算最小值
SUM() OVER(PARTITION BY ... ORDER BY ...) 按分区或返回记录进行求和
AVG() OVER(PARTITION BY ... ORDER BY ...) 按分区或返回记录求出平均值
FIRST_VALUE() OVER(PARTITION BY ... ORDER BY ...) 按分区或返回记录的第一个值
LAST_VALUE() OVER(PARTITION BY ... ORDER BY ...) 按分区或返回记录的最后一个值
LAG() OVER(PARTITION BY ... ORDER BY ...) 按分区或返回记录上偏移值
LEAD() OVER(PARTITION BY ... ORDER BY ...) 按分区或返回记录下偏移值
函数语法说明
PARITION BY
Function_name(...) Over(partition by col_name)
用于分区,按列进行分区
ORDER BY (默认窗口)
Function_name(...) Over(Order by col_name [rows | range between n|unbounded preceding and n| unbounded following])
-rows:【行】前n行、后n行或当前行,用于计算
-range:【范围】大于或小于当前值的n行,或使用前n行来计算
-unbounded:【*限】所有行都使用计算
ROW_NUMBER() OVER(partition by ... order by ...)
功能与rownum伪列类似,order by子句中指定有序列,从1开始为分区中的每一行或查询返回的每一行分配一个唯一的编号。
注意ROW_NUMBER(这里不能限定列名)
查询按部门代码求出薪水排名: 13:49:52 SQL> set pagesize 500 13:50:09 SQL> col last_name format a20 13:50:09 SQL> select last_name,department_id,salary,row_number() over(partition by department_id order by salary ) row_num 13:50:09 2 from employees; LAST_NAME DEPARTMENT_ID SALARY ROW_NUM -------------------- ------------- ---------- ---------- Whalen 10 4390 1 Fay 20 5990 1 Hartstein 20 12990 2 Colmenares 30 2490 1 Himuro 30 2590 2 Tobias 30 2790 3 Baida 30 2890 4 Khoo 30 3090 5 Raphaely 30 10990 6 Mavris 40 6490 1
RANK() OVER(PARTITION BY ... ORDER BY ...) 为查询返回的每一行并列排序,相同排名后的排名会出现跳号
查询部门代码50,工资在3000~6000之间的排名情况 13:56:59 SQL> set pagesize 500 13:57:30 SQL> col last_name format a20 13:57:30 SQL> select last_name,department_id,salary,rank() over(partition by department_id order by salary ) row_num 13:57:30 2 from employees 13:57:30 3 where department_id =50 13:57:30 4 and salary between 3000 and 6000; LAST_NAME DEPARTMENT_ID SALARY ROW_NUM -------------------- ------------- ---------- ---------- Fleaur 50 3090 1 Walsh 50 3090 1 Davies 50 3090 1 Nayer 50 3190 4--出现跳号 McCain 50 3190 4 Taylor 50 3190 4 Stiles 50 3190 4 Bissot 50 3290 8--出现跳号 Mallin 50 3290 8 Dellinger 50 3390 10--出现跳号 Rajs 50 3490 11 Dilly 50 3590 12 Ladwig 50 3590 12 Chung 50 3790 14--出现跳号 Everett 50 3890 15 Bell 50 3990 16 Bull 50 4090 17 Sarchand 50 4190 18 Mourgos 50 5790 19 已選取 19 個資料列.
DENSE_RANK() OVER(PARTITION BY ... ORDER BY ...) 为查询返回的每一行并列排序,相同排名后的排名不会跳号
查询部门代码50,工资在3000~6000之间的排名情况 14:01:48 SQL> set pagesize 500 14:02:06 SQL> col last_name format a20 14:02:06 SQL> select last_name,department_id,salary,dense_rank() over(partition by department_id order by salary ) row_num 14:02:06 2 from employees 14:02:06 3 where department_id =50 14:02:06 4 and salary between 3000 and 6000; LAST_NAME DEPARTMENT_ID SALARY ROW_NUM -------------------- ------------- ---------- ---------- Fleaur 50 3090 1 Walsh 50 3090 1 Davies 50 3090 1 Nayer 50 3190 2 McCain 50 3190 2 Taylor 50 3190 2 Stiles 50 3190 2 Bissot 50 3290 3 Mallin 50 3290 3 Dellinger 50 3390 4 Rajs 50 3490 5 Dilly 50 3590 6 Ladwig 50 3590 6 Chung 50 3790 7 Everett 50 3890 8 Bell 50 3990 9 Bull 50 4090 10 Sarchand 50 4190 11 Mourgos 50 5790 12 已選取 19 個資料列.
COUNT() OVER(PARTITION BY ... ORDER BY ...) 返回查询记录或分区的计数值(次数)
14:11:48 SQL> set pagesize 500 14:12:54 SQL> col last_name format a20 14:12:54 SQL> select last_name,department_id,salary,count(salary) over( partition by department_id ) count_num 14:12:54 2 from employees; LAST_NAME DEPARTMENT_ID SALARY COUNT_NUM -------------------- ------------- ---------- ---------- Whalen 10 4390 1--部门10中出现了1次 Hartstein 20 12990 2--部门20中出现了2次 Fay 20 5990 2--部门20中出现了2次 Raphaely 30 10990 6--部门30中出现了6次 Khoo 30 3090 6 Baida 30 2890 6 Tobias 30 2790 6 Himuro 30 2590 6 Colmenares 30 2490 6 Mavris 40 6490 1 Weiss 50 7990 45 Fripp 50 8190 45 Kaufling 50 7890 45 Vollman 50 6490 45
MAX() OVER(PARTITION BY ... ORDER BY ...) 按分区返回最大的值
14:12:55 SQL> set pagesize 500 14:15:59 SQL> col last_name format a20 14:15:59 SQL> select last_name,department_id,salary,max(salary) over( partition by department_id ) max_sal 14:15:59 2 from employees 14:15:59 3 ; LAST_NAME DEPARTMENT_ID SALARY MAX_SAL -------------------- ------------- ---------- ---------- Whalen 10 4390 4390--部门10薪水最高的 Hartstein 20 12990 12990--部门20薪水最高的 Fay 20 5990 12990--部门20薪水最高的 Raphaely 30 10990 10990--部门30薪水最高的 Khoo 30 3090 10990 Baida 30 2890 10990 Tobias 30 2790 10990 Himuro 30 2590 10990 Colmenares 30 2490 10990 Mavris 40 6490 6490--部门40薪水最高的 Weiss 50 7990 8190--部门50薪水最高的 Fripp 50 8190 8190 Kaufling 50 7890 8190 Vollman 50 6490 8190 Mourgos 50 5790 8190
MIN() OVER(PARTITION BY ... ORDER BY ...) 按分区返回最小的值
14:16:00 SQL> set pagesize 500 14:18:10 SQL> col last_name format a20 14:18:10 SQL> select last_name,department_id,salary,min(salary) over( partition by department_id ) min_sal 14:18:10 2 from employees; LAST_NAME DEPARTMENT_ID SALARY MIN_SAL -------------------- ------------- ---------- ---------- Whalen 10 4390 4390 Hartstein 20 12990 5990 Fay 20 5990 5990 Raphaely 30 10990 2490 Khoo 30 3090 2490 Baida 30 2890 2490 Tobias 30 2790 2490 Himuro 30 2590 2490 Colmenares 30 2490 2490 Mavris 40 6490 6490 Weiss 50 7990 2090 Fripp 50 8190 2090 Kaufling 50 7890 2090 Vollman 50 6490 2090 Mourgos 50 5790 2090
SUM() OVER(PARTITION BY ... ORDER BY ...) 按分区汇总求和
查询所有薪水累积相加 14:32:10 SQL> set pagesize 500 14:32:40 SQL> col last_name format a20 14:32:40 SQL> select last_name,department_id,salary,sum(salary) over( order by salary ) sum_sal 14:32:40 2 from employees 14:32:40 3 where department_id=30; LAST_NAME DEPARTMENT_ID SALARY SUM_SAL -------------------- ------------- ---------- ---------- Colmenares 30 2490 2490 Himuro 30 2590 5080 Tobias 30 2790 7870 Baida 30 2890 10760 Khoo 30 3090 13850 Raphaely 30 10990 24840 已選取 6 個資料列. 查询按部门分区进行求和 14:19:28 SQL> set pagesize 500 14:20:06 SQL> col last_name format a20 14:20:06 SQL> select last_name,department_id,salary,sum(salary) over( partition by department_id ) sum_sal 14:20:06 2 from employees; LAST_NAME DEPARTMENT_ID SALARY SUM_SAL -------------------- ------------- ---------- ---------- Whalen 10 4390 4390 Hartstein 20 12990 18980 Fay 20 5990 18980 Raphaely 30 10990 24840 Khoo 30 3090 24840 Baida 30 2890 24840 Tobias 30 2790 24840 Himuro 30 2590 24840 Colmenares 30 2490 24840 Mavris 40 6490 6490 Weiss 50 7990 155950 Fripp 50 8190 155950 Kaufling 50 7890 155950 查询按部门分区前后2笔进行求和 【窗口期】 14:24:33 SQL> set pagesize 500 14:24:59 SQL> col last_name format a20 14:24:59 SQL> select last_name,department_id,salary,sum(salary) over( partition by department_id order by salary rows between 2 preceding and 2 following ) sum_sal 14:24:59 2 from employees; LAST_NAME DEPARTMENT_ID SALARY SUM_SAL -------------------- ------------- ---------- ---------- Whalen 10 4390 4390 --部门只有1行 Fay 20 5990 18980 Hartstein 20 12990 18980 Colmenares 30 2490 7870--部门30,前后2行求和,7870=2490+2590+2790 Himuro 30 2590 10760--部门30,前后2行求和,10760 =2490+2590+2790+2890 Tobias 30 2790 13850--部门30,前后2行求和,13850 =2490+2590+2790+2890+3090 Baida 30 2890 22350--部门30,前后2行求和,22350 =2590+2790+2890+3090+10990 Khoo 30 3090 19760 Raphaely 30 10990 16970 Mavris 40 6490 6490 Olson 50 2090 6470 Philtanker 50 2190 8860 Markle 50 2190 11250 Gee 50 2390 11650 Landry 50 2390 11950 Patel 50 2490 12250 Vargas 50 2490 12350 Marlow 50 2490 12450 Perkins 50 2490 12550 Sullivan 50 2490 12650 OConnell 50 2590 12750 范围在200以内的【窗口期】 set pagesize 500 col last_name format a20 select last_name,department_id,salary,sum(salary) over( partition by department_id order by salary range between 200 preceding and 200 following ) sum_sal from employees; 范围无限制【窗口期】 set pagesize 500 col last_name format a20 select last_name,department_id,salary,sum(salary) over( partition by department_id order by salary range between unbounded preceding and unbounded following ) sum_sal from employees; set pagesize 500 col last_name format a20 select last_name,department_id,salary,sum(salary) over( partition by department_id order by salary rows between unbounded preceding and unbounded following ) sum_sal from employees;
AVG() OVER(PARTITION BY ... ORDER BY ...) 按分区求出平均值
14:39:56 SQL> set pagesize 500 14:39:56 SQL> col last_name format a20 14:39:56 SQL> select last_name,department_id,salary,avg(salary) over( partition by department_id ) avg_sal 14:39:56 2 from employees; LAST_NAME DEPARTMENT_ID SALARY AVG_SAL -------------------- ------------- ---------- ---------- Whalen 10 4390 4390 Hartstein 20 12990 9490 Fay 20 5990 9490 Raphaely 30 10990 4140 Khoo 30 3090 4140 Baida 30 2890 4140 Tobias 30 2790 4140 Himuro 30 2590 4140 Colmenares 30 2490 4140 Mavris 40 6490 6490 Weiss 50 7990 3465.55556 Fripp 50 8190 3465.55556 Kaufling 50 7890 3465.55556 Vollman 50 6490 3465.55556 Mourgos 50 5790 3465.55556FIRST_VALUE() OVER(PARTITION BY ... ORDER BY ....) 返回一组有序的值中的第一个值。 如果集合中的第一个值为空,则函数将返回null,除非指定IGNORE NULLS LAST_VALUE() OVER(PARTITION BY ... ORDER BY ....) 返回一组有序的值中的最后一个值。 如果集合中的最后一个值为空,则函数将返回null,除非指定IGNORE NULLS。
14:49:16 SQL> set pagesize 500 14:50:48 SQL> col last_name format a20 14:50:48 SQL> select last_name,department_id,salary,first_value(salary) over( partition by department_id) first_sal, 14:50:48 2 last_value(salary) over( partition by department_id ) last_sal 14:50:48 3 from employees; LAST_NAME DEPARTMENT_ID SALARY FIRST_SAL LAST_SAL -------------------- ------------- ---------- ---------- ---------- Whalen 10 4390 4390 4390 Hartstein 20 12990 12990 5990 Fay 20 5990 12990 5990 Raphaely 30 10990 10990 2490 Khoo 30 3090 10990 2490 Baida 30 2890 10990 2490 Tobias 30 2790 10990 2490 Himuro 30 2590 10990 2490 Colmenares 30 2490 10990 2490 Mavris 40 6490 6490 6490 Weiss 50 7990 7990 2590 Fripp 50 8190 7990 2590 Kaufling 50 7890 7990 2590LAG(column,n,default) OVER(PARTITION BY ... ORDER BY ...) 上偏移位置的返回值,n表示偏移量(正整数),default指定默认值 LEAD(column,n,default) OVER(PARTITION BY ... ORDER BY ...) 下偏移位置的返回值,n表示偏移量(正整数) ,default指定默认值
14:58:21 SQL> set pagesize 500 14:58:33 SQL> col last_name format a20 14:58:33 SQL> select last_name,department_id,lag(salary,1,0) over( partition by department_id order by salary) lag_sal, 14:58:33 2 salary, 14:58:33 3 lead(salary,1,NULL) over( partition by department_id order by salary) lead_sal 14:58:33 4 from employees; LAST_NAME DEPARTMENT_ID LAG_SAL SALARY LEAD_SAL -------------------- ------------- ---------- ---------- ---------- Whalen 10 0 4390 Fay 20 0 5990 12990 Hartstein 20 5990 12990 Colmenares 30 0 2490 2590 Himuro 30 2490 2590 2790 Tobias 30 2590 2790 2890 Baida 30 2790 2890 3090 Khoo 30 2890 3090 10990 Raphaely 30 3090 10990 Mavris 40 0 6490 Olson 50 0 2090 2190 Philtanker 50 2090 2190 2190 Markle 50 2190 2190 2390 Gee 50 2190 2390 2390学习总结: 1.了解分析函数和聚合(组)函数之间的差异 2.一般分析函数大致分为4类 伪列类:row_number() over()、rank() over()和dense_rank() over() 计算类:sum() over()、count() over()、max() over()、min() over()和avg() over() 返回值类:first_value() over()和last_value() over() 偏移类:lag() over()和lead() over
下一篇: php生成0到1(小数)的随机数