使用flume+kafka+storm构建实时日志分析系统

程序员文章站 2022-05-21 20:13:44

...

使用flume+kafka+storm构建实时日志分析系统

本文只会涉及flume和kafka的结合，kafka和storm的结合可以参考其他博客
1. flume安装使用
下载flume安装包http://www.apache.org/dyn/closer.cgi/flume/1.5.2/apache-flume-1.5.2-bin.tar.gz
解压$ tar -xzvf apache-flume-1.5.2-bin.tar.gz -C /opt/flume
flume配置文件放在conf文件目录下，执行文件放在bin文件目录下。
1）配置flume
进入conf目录将flume-conf.properties.template拷贝一份，并命名为自己需要的名字
$ cp flume-conf.properties.template flume.conf
修改flume.conf的内容，我们使用file sink来接收channel中的数据，channel采用memory channel，source采用exec source，配置文件如下：


agent.sources = seqGenSrc
agent.channels = memoryChannel
agent.sinks = loggerSink

# For each one of the sources, the type is defined
agent.sources.seqGenSrc.type = exec
agent.sources.seqGenSrc.command = tail -F /data/mongodata/mongo.log
#agent.sources.seqGenSrc.bind = 172.168.49.130

# The channel can be defined as follows.
agent.sources.seqGenSrc.channels = memoryChannel

# Each sink's type must be defined
agent.sinks.loggerSink.type = file_roll
agent.sinks.loggerSink.sink.directory = /data/flume

#Specify the channel the sink should use
agent.sinks.loggerSink.channel = memoryChannel

# Each channel's type is defined.
agent.channels.memoryChannel.type = memory

# Other config values specific to each type of channel(sink or source)
# can be defined as well
# In this case, it specifies the capacity of the memory channel
agent.channels.memoryChannel.capacity = 1000
agent.channels.memory4log.transactionCapacity = 100

2）运行flume agent
切换到bin目录下，运行一下命令：
$ ./flume-ng agent --conf ../conf -f ../conf/flume.conf --n agent -Dflume.root.logger=INFO,console
在/data/flume目录下可以看到生成的日志文件。

2. 结合kafka
由于flume1.5.2没有kafka sink，所以需要自己开发kafka sink
可以参考flume 1.6里面的kafka sink，但是要注意使用的kafka版本，由于有些kafka api不兼容的
这里只提供核心代码，process()内容。


Sink.Status status = Status.READY;




Channel ch = getChannel();


Transaction transaction = null;


Event event = null;


String eventTopic = null;


String eventKey = null;




try {


transaction = ch.getTransaction();


transaction.begin();


messageList.clear();




if (type.equals("sync")) {


event = ch.take();




 if (event != null) {


        byte[] tempBody = event.getBody();


 String eventBody = new String(tempBody,"UTF-8");


 Map headers = event.getHeaders();




 if ((eventTopic = headers.get(TOPIC_HDR)) == null) {


          eventTopic = topic;


 }




        eventKey = headers.get(KEY_HDR);




 if (logger.isDebugEnabled()) {


 logger.debug("{Event} " + eventTopic + " : " + eventKey + " : "


 + eventBody);


 }


 


        ProducerData data = new ProducerData


 (eventTopic, new Message(tempBody));


 


 long startTime = System.nanoTime();


 logger.debug(eventTopic+"++++"+eventBody);


        producer.send(data);


 long endTime = System.nanoTime(); 
 }


} else {


long processedEvents = 0;


for (; processedEvents 
event = ch.take();




 if (event == null) {


 break;


 }




 byte[] tempBody = event.getBody();


 String eventBody = new String(tempBody,"UTF-8");


 Map headers = event.getHeaders();




 if ((eventTopic = headers.get(TOPIC_HDR)) == null) {


          eventTopic = topic;


 }




        eventKey = headers.get(KEY_HDR);




 if (logger.isDebugEnabled()) {


 logger.debug("{Event} " + eventTopic + " : " + eventKey + " : "


 + eventBody);


 logger.debug("event #{}", processedEvents);


 }




 // create a message and add to buffer


        ProducerData data = new ProducerData


 (eventTopic, eventBody);


        messageList.add(data);


}




// publish batch and commit.


 if (processedEvents > 0) {


 long startTime = System.nanoTime(); 
 long endTime = System.nanoTime(); 
 }


}




transaction.commit();


} catch (Exception ex) {


String errorMsg = "Failed to publish events";


logger.error("Failed to publish events", ex);


status = Status.BACKOFF;


if (transaction != null) {


try {


transaction.rollback(); 
} catch (Exception e) {


logger.error("Transaction rollback failed", e);


throw Throwables.propagate(e);


}


}


throw new EventDeliveryException(errorMsg, ex);


} finally {


if (transaction != null) {


transaction.close();


}


}




return status;

下一步，修改flume配置文件，将其中sink部分的配置改成kafka sink，如：


producer.sinks.r.type = org.apache.flume.sink.kafka.KafkaSink


producer.sinks.r.brokerList = bigdata-node00:9092


producer.sinks.r.requiredAcks = 1


producer.sinks.r.batchSize = 100


#producer.sinks.r.kafka.producer.type=async


#producer.sinks.r.kafka.customer.encoding=UTF-8


producer.sinks.r.topic = testFlume1

type指向kafkasink所在的完整路径
下面的参数都是kafka的一系列参数，最重要的是brokerList和topic参数

现在重新启动flume，就可以在kafka的对应topic下查看到对应的日志

使用flume+kafka+storm构建实时日志分析系统

使用flume+kafka+storm构建实时日志分析系统

php高性能日志系统 seaslog 的安装与使用方法分析

使用apachetop实时监控日志、动态分析服务器运行状态

Spark 实战, 第 2 部分:使用 Kafka 和 Spark Streaming 构建实时数据处理系统

使用py 和flask 实现的服务器系统目录浏览，日志文件实时显示到网页的功能

使用perl+MongoDB实现一个WEB站点请求耗时日志分析系统

使用flume+kafka+storm构建实时日志分析系统

使用apachetop实时监控日志、动态分析服务器运行状态

如何利用系统日志和实时获取实例屏幕截图分析排查实例故障

php高性能日志系统 seaslog 的安装与使用方法分析

如何利用系统日志和实时获取实例屏幕截图分析排查实例故障