Flink流式计算
structured streaming
- a stream is converted into a dynamic table.
- a continuous query is evaluated on the dynamic table yielding a new dynamic table.
- the resulting dynamic table is converted back into a stream.
defining a table on a stream
continuous queries
handling event-time
tumble(time_attr, interval),定义一个个连续的时间窗口,这样每行数据只可能出现在一个窗口内,窗口之间不会出现重叠defines a tumbling time window. a tumbling time window assigns rows to non-overlapping, continuous windows with a fixed duration (interval). for example, a tumbling window of 5 minutes groups rows in 5 minutes intervals. tumbling windows can be defined on event-time (stream + batch) or processing-time (stream).
tumble_start(time_attr, interval). 返回时间窗口的下限时间戳.returns the timestamp of the inclusive lower bound of the corresponding tumbling, hopping, or session window.
handling late data
bob 12:54:00 ./xxx 到达时间14:01:00如何处理?
watermarks定义在ctime,允许延迟2hour, 14:00:00-2hour<13:00:00,窗口12:00:00-13::00:00仍保持
watermarks定义在ctime,允许延迟5min,14:00:00-5min>13:00:00,时间窗口12:00:00-13:00:00已过期,数据被丢弃
上一篇: windows 常用命令
下一篇: Spark 中的机器学习库及示例