第 1 节 滑动窗口单词计数(Java实现)
程序员文章站
2022-06-16 17:26:55
...
上篇:Flink的官网简述
flink入门案例wordCount
需求分析:
手工通过socket实时产生一些单词,使用flink实时接收数据,对指定时间窗口内(例如:2秒)的数据进行聚合统计,并且把时间窗口内计算的结果打印出来
1、创建maven工程项目
在pom文件添加依赖:
<dependencies>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-java</artifactId>
<version>1.6.1</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-java_2.11</artifactId>
<version>1.6.1</version>
// <scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-clients_2.11</artifactId>
<version>1.6.1</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-scala_2.11</artifactId>
<version>1.6.1</version>
</dependency>
</dependencies>
2、代码编写
package xuwei.tech;
import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.java.utils.ParameterTool;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.util.Collector;
/**
* 需求:滑动窗口计算
*
* 通过socket模拟产生单词数据
* flink对数据进行统计计算
*
*需要实现每隔1秒对最近2秒的数据进行汇总计算
*/
public class SocketWindowWordCountJava {
public static void main(String[] args)throws Exception {
//获取需要的端口号
int port;
try {
ParameterTool parameterTool = ParameterTool.fromArgs(args);
port = parameterTool.getInt("port");
} catch (Exception e) {
System.err.println("No port set.use default port 9000");
port = 9000;
}
//获取flink的运行环境
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
String hostname="flink102";
String delimiter="\n"; //换行符,表示结束
//连接获取socket获取输入的数据
DataStreamSource<String> socketWord = env.socketTextStream(hostname, port, delimiter);
//a、b、c
//a 1
//b 1
//c 1
DataStream<WordWithCount> windowCounts = socketWord.flatMap(new FlatMapFunction<String, WordWithCount>() {
@Override
public void flatMap(String value, Collector<WordWithCount> out) throws Exception {
String[] splits = value.split("\\s");
for (String word : splits) {
out.collect(new WordWithCount(word, 1L));
}
}
}).keyBy("word")
.timeWindow(Time.seconds(2), Time.seconds(1)) //每隔1秒对最近2秒的统计次数(指定时间窗口大小为2秒,指定时间间隔为1秒)
.sum("count");//在这里使用sum或者reduce都可以
/* .reduce(new ReduceFunction<WordWithCount>() {
@Override
public WordWithCount reduce(WordWithCount a, WordWithCount b) throws Exception {
return new WordWithCount(a.word,a.count+b.count);
}
})*/
//把数据打印到控制台并且设置并行度
windowCounts.print().setParallelism(1);
//这一行代码一定要实现,否者程序不执行
env.execute("Socket window count");
}
public static class WordWithCount{
public String word;
public long count;
public WordWithCount() {}
public WordWithCount(String word, long count) {
this.word = word;
this.count = count;
}
@Override
public String toString() {
return "WordWithCount{" +
"word='" + word + '\'' +
", count=" + count +
'}';
}
}
}
/**
* Flink 程序开发步骤:
* 1:获得一个执行环境
* 2:加载/创建 初始化数据
* 3:指定操作数据的transaction算子
* 4:指定把计算好的数据放在哪
* 5:调用execute()触发执行程序
* 注意:Flink程序是延迟计算的,只有最后调用execute()方法的时候才会真正触发执行程序。
* 延迟计算好处:你可以开发复杂的程序,但是Flink可以将复杂的程序转成一个Plan,将Plan作为一个整体单元执行!
*/
之后,虚拟机执行命令:
//发现nc命令没有安装
[root@flink102 ~]# nc -l 9000
-bash: nc: command not found
//安装命令:
[root@flink102 ~]# yum install -y nc
//执行9000命令
[root@flink102 ~]# nc -l 9000
启动程序,控制台打印:
虚拟机需要关闭防火墙:
[root@flink102 ~]# systemctl stop firewalld.service
//输入数据
[root@flink102 ~]# nc -l 9999
a a
a a1
b b1