日志收集之flume和kafka集成
程序员文章站
2022-06-14 14:38:50
...
对于收集日志文件,选择flume和kafka的组合,其中flume完成对日志的聚集功能,kafka实现数据流的缓冲和削峰。
由上图可见,flume可以作为生产者,也可以作为消费者。
1 flume作为生产者
flume作为生产者,内部架构可以分为两种:source-channel和source-channel-sink
(1)source-channel架构
source-channel架构,channel选择为KafkaChannel,需求分析如下:
编写配置文件kafka_channel.conf:
# netcat source
a1.sources = r1
a1.sources.r1.type = netcat
a1.sources.r1.bind = 0.0.0.0
a1.sources.r1.port = 66666
# kafka channel
a1.channels = c1
a1.channels.c1.type = org.apache.flume.channel.kafka.KafkaChannel
a1.channels.c1.kafka.bootstrap.servers = hadoop102:9092
a1.channels.c1.kafka.topic = flume01
a1.channels.c1.kafka.consumer.group.id = flume-consume
# bind the source to the channel
a1.sources.r1.channels = c1
依次启动zookeeper集群和kafka集群,检查kafka中的话题:
[jl@hadoop102 job]$ kafka-topics.sh --list --bootstrap-server hadoop102:9092
__consumer_offsets
启动flume,运行命令,生成数据:
[jl@hadoop102 flume]$ bin/flume-ng agent -n a1 -c conf/ -f job/kafka_channel.conf -Dflume.root.logger=INFO,console
[jl@hadoop102 ~]$ nc localhost 66666
hello
OK
再次检查话题:
[jl@hadoop102 job]$ kafka-topics.sh --list --bootstrap-server hadoop102:9092
__consumer_offsets
flume01
启动消费者:
[jl@hadoop102 job]$ kafka-console-consumer.sh --bootstrap-server hadoop102:9092 --topic flume01
(2)source-channel-sink架构
source-channel-sink架构,source为netcat source,channel选择为memory channel,sink选择为kafka sink需求分析如下:
准备配置文件kafka_sink.conf:
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# netcat source
a1.sources.r1.type = netcat
a1.sources.r1.bind = 0.0.0.0
a1.sources.r1.port = 66666
# channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# kafka sink
a1.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
a1.sinks.k1.kafka.bootstrap.servers = hadoop102:9092
a1.sinks.k1.kafka.topic = flume02
a1.sinks.k1.kafka.consumer.group.id = flume-consume
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
依次启动zookeeper集群和kafka集群,检查kafka中的话题:
[jl@hadoop102 job]$ kafka-topics.sh --list --bootstrap-server hadoop102:9092
__consumer_offsets
flume01
启动flume,运行命令,生成数据:
[jl@hadoop102 flume]$ bin/flume-ng agent -n a1 -c conf/ -f job/kafka_sink.conf -Dflume.root.logger=INFO,console
[jl@hadoop102 ~]$ nc localhost 66666
hello
OK
再次检查话题:
[jl@hadoop102 job]$ kafka-topics.sh --list --bootstrap-server hadoop102:9092
__consumer_offsets
flume01
flume02
启动消费者:
[jl@hadoop102 job]$ kafka-console-consumer.sh --bootstrap-server hadoop102:9092 --topic flume02
2 flume作为消费者
需求分析:
准备配置文件kafka_source.conf:
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# kafka source
a1.sources.r1.type = org.apache.flume.source.kafka.KafkaSource
a1.sources.r1.kafka.bootstrap.servers = hadoop102:9092
a1.sources.r1.kafka.topics = flume01
a1.sources.r1.kafka.consumer.group.id = custom.g.id
a1.sources.r1.batchSize = 100
# memory channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# logger sink
a1.sinks.k1.type = logger
# bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
依次启动zookeeper集群和kafka集群,启动flume:
[jl@hadoop102 flume]$ bin/flume-ng agent -n a1 -c conf/ -f job/kafka_source.conf -Dflume.root.logger=INFO,console
启动kafka的生产者添加数据
[jl@hadoop102 flume]$ kafka-console-producer.sh --broker-list hadoop102:9092 -topic flume01
上一篇: I/O复习(一)——Java I/O
下一篇: 宝宝多大可以喝酸奶,宝宝的营养辅食有哪些