sql实现寻找中位数(使用sign、case、自定义变量等)
sql实现寻找中位数
思路1
对于一个 奇数 长度数组中的 中位数,大于这个数的数值个数等于小于这个数的数值个数。
算法
根据上述的定义,我们来找一下 [1, 3, 2] 中的中位数。首先 1 不是中位数,因为这个数组有三个元素,却有两个元素 (3,2) 大于 1。3 也不是中位数,因为有两个元素小于 3。对于最后一个 2 来说,大于 2 和 小于 2 的元素数量是相等的,因此 2 是当前数组的中位数。
当数组长度为 偶数,且元素唯一时,中位数等于排序后 中间两个数 的平均值。对这两个数来说,大于当前数的数值个数跟小于当前数的数值个数绝对值之差为 1,恰好等于这个数出现的频率。
总的来说,不管是数组长度是奇是偶,也不管元素是不是唯一,中位数出现的频率一定大于等于大于它的数和小于它的数的绝对值之差。这个规律是这道题的关键,可以通过下面这个搜索条件来过滤。
--架构
Create table If Not Exists Employee (Id int, Company varchar(255), Salary int)
Truncate table Employee
insert into Employee (Id, Company, Salary) values ('1', 'A', '2341')
insert into Employee (Id, Company, Salary) values ('2', 'A', '341')
insert into Employee (Id, Company, Salary) values ('3', 'A', '15')
insert into Employee (Id, Company, Salary) values ('4', 'A', '15314')
insert into Employee (Id, Company, Salary) values ('5', 'A', '451')
insert into Employee (Id, Company, Salary) values ('6', 'A', '513')
insert into Employee (Id, Company, Salary) values ('7', 'B', '15')
insert into Employee (Id, Company, Salary) values ('8', 'B', '13')
insert into Employee (Id, Company, Salary) values ('9', 'B', '1154')
insert into Employee (Id, Company, Salary) values ('10', 'B', '1345')
insert into Employee (Id, Company, Salary) values ('11', 'B', '1221')
insert into Employee (Id, Company, Salary) values ('12', 'B', '234')
insert into Employee (Id, Company, Salary) values ('13', 'C', '2345')
insert into Employee (Id, Company, Salary) values ('14', 'C', '2645')
insert into Employee (Id, Company, Salary) values ('15', 'C', '2645')
insert into Employee (Id, Company, Salary) values ('16', 'C', '2652')
insert into Employee (Id, Company, Salary) values ('17', 'C', '65')
SELECT
Employee.Id, Employee.Company, Employee.Salary
FROM
Employee,
Employee alias
WHERE
Employee.Company = alias.Company
GROUP BY Employee.Company , Employee.Salary
HAVING SUM(CASE
WHEN Employee.Salary = alias.Salary THEN 1
ELSE 0
END) >= ABS(SUM(SIGN(Employee.Salary - alias.Salary)))
ORDER BY Employee.Id;
--alias是别名
1.运用CASE表达式,非等值自连接和HAVING子句来找中位数
2.通过 WHERE e1.Company = e2.Company 进行分组
3.最后通过GROUP BY 去重
--更好理解一些:
select Id, Company, Salary
from Employee
where Id in (select e1.Id
from Employee e1, Employee e2
WHERE e1.Company = e2.Company
GROUP BY e1.Id
HAVING SUM(CASE WHEN e1.Salary >= e2.Salary THEN 1 ELSE 0 END) >= COUNT(*)/2 AND SUM(CASE WHEN e1.Salary <= e2.Salary THEN 1 ELSE 0 END) >= COUNT(*)/2)
GROUP BY Company, Salary
ORDER BY Company
case
The CASE statement goes through conditions and returns a value when the first condition is met (like an IF-THEN-ELSE statement). So, once a condition is true, it will stop reading and return the result. If no conditions are true, it returns the value in the ELSE clause.If there is no ELSE part and no conditions are true, it returns NULL.
--用例
CASE
WHEN condition1 THEN result1
WHEN condition2 THEN result2
WHEN conditionN THEN resultN
ELSE result
END;
sign():
这里是引用
ps:注意:在 MySQL 5.6 中,这个代码是可以运行的,但如果你用的是 MySQL 5.7+,就需要在 SELECT 语句中 把 Employee.id 改成 ANY_VALUE(Employee.Id)。
思路二:排序后再找中位数
根据 salary 排序记录,利用会话变量计算排名。由于不需要级联表,这个方法要比方法一更高效。
SELECT
Id, Company, Salary
FROM
(SELECT
e.Id,
e.Salary,
e.Company,
IF(@prev = e.Company, @Rank:=@Rank + 1, @Rank:=1) AS rank,
@prev:=e.Company
FROM
Employee e, (SELECT @Rank:=0, @prev:=0) AS temp
ORDER BY e.Company , e.Salary , e.Id) Ranking
INNER JOIN
(SELECT
COUNT(*) AS totalcount, Company AS name
FROM
Employee e2
GROUP BY e2.Company) companycount ON companycount.name = Ranking.Company
WHERE
Rank = FLOOR((totalcount + 1) / 2)
OR Rank = FLOOR((totalcount + 2) / 2)
;
自定义变量
本文地址:https://blog.csdn.net/qq_43515555/article/details/107309590
上一篇: SqlServer2008实例22表基础之减少NULL列的存储空间
下一篇: MongoDB高级部分