欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

日志收集之flume和kafka集成

程序员文章站 2022-06-14 14:38:50
...

对于收集日志文件,选择flume和kafka的组合,其中flume完成对日志的聚集功能,kafka实现数据流的缓冲和削峰。
日志收集之flume和kafka集成
由上图可见,flume可以作为生产者,也可以作为消费者。
1 flume作为生产者
flume作为生产者,内部架构可以分为两种:source-channel和source-channel-sink
(1)source-channel架构
source-channel架构,channel选择为KafkaChannel,需求分析如下:
日志收集之flume和kafka集成编写配置文件kafka_channel.conf:

# netcat source
a1.sources = r1
a1.sources.r1.type = netcat
a1.sources.r1.bind = 0.0.0.0
a1.sources.r1.port = 66666

# kafka channel
a1.channels = c1
a1.channels.c1.type = org.apache.flume.channel.kafka.KafkaChannel
a1.channels.c1.kafka.bootstrap.servers = hadoop102:9092
a1.channels.c1.kafka.topic = flume01
a1.channels.c1.kafka.consumer.group.id = flume-consume

# bind the source to the channel
a1.sources.r1.channels = c1

依次启动zookeeper集群和kafka集群,检查kafka中的话题:

[jl@hadoop102 job]$  kafka-topics.sh --list --bootstrap-server hadoop102:9092
__consumer_offsets

启动flume,运行命令,生成数据:

[jl@hadoop102 flume]$ bin/flume-ng agent -n a1 -c conf/ -f job/kafka_channel.conf -Dflume.root.logger=INFO,console
[jl@hadoop102 ~]$ nc localhost 66666
hello      
OK

再次检查话题:

[jl@hadoop102 job]$  kafka-topics.sh --list --bootstrap-server hadoop102:9092
__consumer_offsets
flume01

启动消费者:

[jl@hadoop102 job]$ kafka-console-consumer.sh --bootstrap-server hadoop102:9092 --topic flume01

日志收集之flume和kafka集成(2)source-channel-sink架构
source-channel-sink架构,source为netcat source,channel选择为memory channel,sink选择为kafka sink需求分析如下:
日志收集之flume和kafka集成准备配置文件kafka_sink.conf:

# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# netcat source
a1.sources.r1.type = netcat
a1.sources.r1.bind = 0.0.0.0
a1.sources.r1.port = 66666

#  channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# kafka sink
a1.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
a1.sinks.k1.kafka.bootstrap.servers = hadoop102:9092
a1.sinks.k1.kafka.topic = flume02
a1.sinks.k1.kafka.consumer.group.id = flume-consume

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

依次启动zookeeper集群和kafka集群,检查kafka中的话题:

[jl@hadoop102 job]$  kafka-topics.sh --list --bootstrap-server hadoop102:9092
__consumer_offsets
flume01

启动flume,运行命令,生成数据:

[jl@hadoop102 flume]$ bin/flume-ng agent -n a1 -c conf/ -f job/kafka_sink.conf -Dflume.root.logger=INFO,console
[jl@hadoop102 ~]$ nc localhost 66666
hello      
OK

再次检查话题:

[jl@hadoop102 job]$  kafka-topics.sh --list --bootstrap-server hadoop102:9092
__consumer_offsets
flume01
flume02

启动消费者:

[jl@hadoop102 job]$ kafka-console-consumer.sh --bootstrap-server hadoop102:9092 --topic flume02

日志收集之flume和kafka集成2 flume作为消费者
需求分析:
日志收集之flume和kafka集成准备配置文件kafka_source.conf:

# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# kafka source
a1.sources.r1.type = org.apache.flume.source.kafka.KafkaSource
a1.sources.r1.kafka.bootstrap.servers = hadoop102:9092
a1.sources.r1.kafka.topics = flume01
a1.sources.r1.kafka.consumer.group.id = custom.g.id
a1.sources.r1.batchSize = 100


# memory channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# logger sink
a1.sinks.k1.type = logger

# bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

依次启动zookeeper集群和kafka集群,启动flume:

[jl@hadoop102 flume]$ bin/flume-ng agent -n a1 -c conf/ -f job/kafka_source.conf -Dflume.root.logger=INFO,console

启动kafka的生产者添加数据

[jl@hadoop102 flume]$ kafka-console-producer.sh --broker-list hadoop102:9092 -topic flume01

日志收集之flume和kafka集成

相关标签: flume kafka