日志收集之flume和kafka集成

程序员文章站 2022-06-14 14:38:50

...

对于收集日志文件，选择flume和kafka的组合，其中flume完成对日志的聚集功能，kafka实现数据流的缓冲和削峰。
日志收集之flume和kafka集成
由上图可见，flume可以作为生产者，也可以作为消费者。
1 flume作为生产者
flume作为生产者，内部架构可以分为两种：source-channel和source-channel-sink
（1）source-channel架构
source-channel架构，channel选择为KafkaChannel，需求分析如下：
日志收集之flume和kafka集成编写配置文件kafka_channel.conf：

# netcat source
a1.sources = r1
a1.sources.r1.type = netcat
a1.sources.r1.bind = 0.0.0.0
a1.sources.r1.port = 66666

# kafka channel
a1.channels = c1
a1.channels.c1.type = org.apache.flume.channel.kafka.KafkaChannel
a1.channels.c1.kafka.bootstrap.servers = hadoop102:9092
a1.channels.c1.kafka.topic = flume01
a1.channels.c1.kafka.consumer.group.id = flume-consume

# bind the source to the channel
a1.sources.r1.channels = c1

依次启动zookeeper集群和kafka集群，检查kafka中的话题：

[jl@hadoop102 job]$  kafka-topics.sh --list --bootstrap-server hadoop102:9092
__consumer_offsets

启动flume，运行命令，生成数据：

[jl@hadoop102 flume]$ bin/flume-ng agent -n a1 -c conf/ -f job/kafka_channel.conf -Dflume.root.logger=INFO,console

[jl@hadoop102 ~]$ nc localhost 66666
hello      
OK

再次检查话题：

[jl@hadoop102 job]$  kafka-topics.sh --list --bootstrap-server hadoop102:9092
__consumer_offsets
flume01

启动消费者：

[jl@hadoop102 job]$ kafka-console-consumer.sh --bootstrap-server hadoop102:9092 --topic flume01

日志收集之flume和kafka集成（2）source-channel-sink架构
source-channel-sink架构，source为netcat source，channel选择为memory channel，sink选择为kafka sink需求分析如下：
准备配置文件kafka_sink.conf：

# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# netcat source
a1.sources.r1.type = netcat
a1.sources.r1.bind = 0.0.0.0
a1.sources.r1.port = 66666

#  channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# kafka sink
a1.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
a1.sinks.k1.kafka.bootstrap.servers = hadoop102:9092
a1.sinks.k1.kafka.topic = flume02
a1.sinks.k1.kafka.consumer.group.id = flume-consume

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

依次启动zookeeper集群和kafka集群，检查kafka中的话题：

[jl@hadoop102 job]$  kafka-topics.sh --list --bootstrap-server hadoop102:9092
__consumer_offsets
flume01

启动flume，运行命令，生成数据：

[jl@hadoop102 flume]$ bin/flume-ng agent -n a1 -c conf/ -f job/kafka_sink.conf -Dflume.root.logger=INFO,console

[jl@hadoop102 ~]$ nc localhost 66666
hello      
OK

再次检查话题：

[jl@hadoop102 job]$  kafka-topics.sh --list --bootstrap-server hadoop102:9092
__consumer_offsets
flume01
flume02

启动消费者：

[jl@hadoop102 job]$ kafka-console-consumer.sh --bootstrap-server hadoop102:9092 --topic flume02

日志收集之flume和kafka集成 2 flume作为消费者
需求分析：
准备配置文件kafka_source.conf:

# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# kafka source
a1.sources.r1.type = org.apache.flume.source.kafka.KafkaSource
a1.sources.r1.kafka.bootstrap.servers = hadoop102:9092
a1.sources.r1.kafka.topics = flume01
a1.sources.r1.kafka.consumer.group.id = custom.g.id
a1.sources.r1.batchSize = 100


# memory channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# logger sink
a1.sinks.k1.type = logger

# bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

依次启动zookeeper集群和kafka集群，启动flume：

[jl@hadoop102 flume]$ bin/flume-ng agent -n a1 -c conf/ -f job/kafka_source.conf -Dflume.root.logger=INFO,console

启动kafka的生产者添加数据

[jl@hadoop102 flume]$ kafka-console-producer.sh --broker-list hadoop102:9092 -topic flume01

日志收集之flume和kafka集成

日志收集之flume和kafka集成

Spark Streaming实时流处理项目实战笔记——Kafka实战之整合Flume和Kafka完成实时数据采集

大数据实时日志收集框架Flume案例之抽取日志文件到HDFS

日志收集之Flume

日志收集之flume和kafka集成

大数据日志收集框架之Flume入门