如何使用SQL窗口子句减少语法开销

程序员文章站 2022-04-04 21:41:30

SQL是一种冗长的语言，其中最冗长的特性之一是窗口函数. 在.最近遇到的堆栈溢出问题，有人要求计算某一特定日期的时间序列中的第一个值和最后一个值之间的差额：输入 volume tstamp 29011 2012-12-28 09:00:00 28701 2012-12-28 10:00:00 28 ......

sql是一种冗长的语言，其中最冗长的特性之一是窗口函数.

在.最近遇到的堆栈溢出问题，有人要求计算某一特定日期的时间序列中的第一个值和最后一个值之间的差额：

输入

volume  tstamp
---------------------------
 2012-12-28 09:00:00
 2012-12-28 10:00:00
 2012-12-28 11:00:00
 2012-12-28 12:00:00
 2012-12-28 13:00:00
 2012-12-28 14:00:00
 2012-12-29 09:00:00
 2012-12-29 10:00:00
 2012-12-29 11:00:00
 2012-12-29 12:00:00
 2012-12-29 13:00:00
 2012-12-29 14:00:00
 

期望输出

first  last   difference  date
------------------------------------
29011  28583  428         2012-12-28
28800  28278  522         2012-12-29
 

如何编写查询

请注意，值和时间戳级数可能不相关。所以，没有一条规定如果timestamp2 > timestamp1然后value2 < value1。否则，这个简单的查询就能工作(使用postgresql语法)：

select 
  max(volume)               as first,
  min(volume)               as last,
  max(volume) - min(volume) as difference,
  cast(tstamp as date)      as date
from t
group by cast(tstamp as date);
 

有几种方法可以在不涉及窗口函数的组中找到第一个和最后一个值。例如：

在oracle中，可以使用第一和最后函数，由于某些神秘原因，这些函数没有编写。first(...) within group (order by ...)或last(...) within group (order by ...)，与其他排序集聚合函数一样，但是some_aggregate_function(...) keep (dense_rank first order by ...)。围棋数字
在postgresql中，可以使用distinct on语法与 order by和limit

有关各种方法的更多细节可以在这里找到：
https://blog.jooq.org/2017/09/22/how-to-write-efficient-top-n-queries-in-sql

最好的方法是使用像oracle这样的聚合函数，但是很少有数据库具有这种功能。所以，我们将使用first_value和last_value窗口函数:

select distinct
  first_value(volume) over (
    partition by cast(tstamp as date) 
    order by tstamp
    rows between unbounded preceding and unbounded following
  ) as first,
  last_value(volume) over (
    partition by cast(tstamp as date) 
    order by tstamp
    rows between unbounded preceding and unbounded following
  ) as last,
  first_value(volume) over (
    partition by cast(tstamp as date) 
    order by tstamp
    rows between unbounded preceding and unbounded following
  ) 
  - last_value(volume) over (
    partition by cast(tstamp as date) 
    order by tstamp
    rows between unbounded preceding and unbounded following
  ) as diff,
  cast(tstamp as date) as date
from t
order by cast(tstamp as date)
 

哎呀。

看上去不太容易读。但它将产生正确的结果。当然，我们可以包装列的定义。first和last在派生表中，但这仍然会给我们留下两次窗口定义的重复：

partition by cast(tstamp as date) 
order by tstamp
rows between unbounded preceding and unbounded following
 

援救窗口条款

幸运的是，至少有3个数据库实现了sql标准。window条款：

mysql
postgresql
sybase sql anywhere

上面的查询可以重构为这个查询：

select distinct
  first_value(volume) over w as first,
  last_value(volume) over w as last,
  first_value(volume) over w 
    - last_value(volume) over w as diff,
  cast(tstamp as date) as date
from t
window w as (
  partition by cast(tstamp as date) 
  order by tstamp
  rows between unbounded preceding and unbounded following
)
order by cast(tstamp as date)
 

请注意，如何使用窗口规范来指定窗口名称，就像定义公共表达式一样(with条款)：

window 
    <window-name> as (<window-specification>)
{  ,<window-name> as (<window-specification>)... }
 

我不仅可以重用整个规范，还可以根据部分规范构建规范，并且只重用部分规范。我以前的查询可以这样重写：

select distinct
  first_value(volume) over w3 as first,
  last_value(volume) over w3 as last,
  first_value(volume) over w3 
    - last_value(volume) over w3 as diff,
  cast(tstamp as date) as date
from t
window 
  w1 as (partition by cast(tstamp as date)),
  w2 as (w1 order by tstamp),
  w3 as (w2 rows between unbounded preceding 
                     and unbounded following)
order by cast(tstamp as date)
 

每个窗口规范可以从头创建，也可以基于先前定义的窗口规范。注在引用窗口定义时也是如此。如果我想重用partition by条款和order by子句，但请更改frame条款(rows ...)，那么我就可以这样写了：

select distinct
  first_value(volume) over (
    w2 rows between unbounded preceding and current row
  ) as first,
  last_value(volume) over (
    w2 rows between current row and unbounded following
  ) as last,
  first_value(volume) over (
    w2 rows unbounded preceding
  ) - last_value(volume) over (
    w2 rows between 1 preceding and unbounded following
  ) as diff,
  cast(tstamp as date) as date
from t
window 
  w1 as (partition by cast(tstamp as date)),
  w2 as (w1 order by tstamp)
order by cast(tstamp as date)

上一篇：小米发布空气净化器MAX 就问年轻人你的第一次打算..

下一篇： es2018(es9)前瞻