log4j+flume+HDFS实现日志存储
程序员文章站
2022-06-14 19:57:26
...
1. HDFS配置
1.1.Hadoop集群搭建
有关HDFS的配置,请参考CentOS7.0下Hadoop2.7.3的集群搭建,为了容易操作,本示例采用单机模式, 即解压hadoop到/opt/hadoop/目录下;
1.2.hdfs配置
- $HADOOP_HOME/etc/hadoop/core-site.xml配置
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
- $HADOOP_HOME/etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
1.3.启动Hadoop
[itlocals-MacBook-Pro: david.tian]$sh $HADOOP_HOME/sbin/start-all.sh
1.4.HDFS创建/flume目录
itlocals-MacBook-Pro: david.tian$hadoop fs -mkdir /flume
1.5.HDFS修改目录读写权限
itlocals-MacBook-Pro: david.tian$hadoop fs -chmod -R 777 /flume
2.flume的安装与配置
2.1.把flume解压到/opt/flume目录下
2.2.在$FLUME_HOME/conf/目录下新建配置文件flume2hdfs
a1.sources=r1
a1.channels=c1
a1.sinks=k1
a1.sources.r1.type=avro
a1.sources.r1.bind=localhost
a1.sources.r1.port=44446
a1.channels.c1.type=memory
a1.channels.c1.capacity=1000
a1.channels.c1.transactionCapacity=1000
a1.channels.c1.keep-alive=30
a1.sinks.k1.type=hdfs
a1.sinks.k1.hdfs.path=hdfs://localhost:9000/flume
a1.sinks.k1.hdfs.fileType=DataStream
a1.sinks.k1.hdfs.writeFormat=Text
a1.sinks.k1.hdfs.rollInterval=100
a1.sinks.k1.hdfs.rollSize=10240
a1.sinks.k1.hdfs.rollCount=0
a1.sinks.k1.hdfs.idleTimeout=600
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
2.3.Flume HDFS Sink重要参数说明
- type: 指flume输出的类型为hdfs
- path: 写入hdfs的路径,需要包含文件系统的标识,如hdfs://localhost:9000/flume
- filePrefix: 默认值为FlumeData,写入hdfs的文件名前缀,可以使用flume提供的日期及%{host}表达式;
- fileSuffix: 写入hdfs文件后名后缀,比如:.lzo, .log等;
- inUsePrefix: 临时文件的文件名前缀,hdfs sink会先往目标目录中写临时文件,再根据相关规则重命名成最终目录文件;
- inUseSuffix: 默认值.tmp,临时文件的文件名后缀;
- rollInterval: 默认值30, 指hdfs sink间隔多长将临时文件滚动成最终上标文件,单位:秒;如果设置为,则表示不根据时间来滚动文件;
- rollSize: 默认值1024,当临时文件达到该大小(单位:bytes)时,滚动成目标文件;如果设置为0,则表示不根据临时文件大小来滚动文件;
- rollCount: 默认值为10, 当events数据达到该数量时,将临时文件滚动成目标文件;如果设置为0,表示不根据events数据来滚动文件;
- idleTimeout: 默认值为0, 当目前被打开的临时文件在该参数指定的时间(秒)内,没有任何数据写入,则将该临时文件关闭并重命名成目标文件;
- batchSize: 默认值为100,每个批次刷新到HDFS上的events数量;
- codeC: 文件压缩格式,包括:gzip, bzip2, lzo, lzop, snappy;
- fileType: 默认值为SequenceFile,文件格式主要包括:SequenceFile, DataStream, CompressedStream;当使用DataStream时,文件不会被压缩,则不需要设置hdfs.codeC; 当使用CompressedStream时,则必须设置一个正确的hdfs.codeC值;
- maxOpenFiles: 默认值5000,最大允许打开的HDFS文件数,当打开的文件数达到该值,最早打开的文件将会被关闭;
- minBlockReplicas: 默认值为HDFS副本数;写入HDFS文件块的最小副本数,该参数会影响文件的滚动配置,一般将该参数配置成1,才可以按照配置正确滚动文件;
- writeFormat: 写sequence文件的格式,包含:Text, Writable(默认);
- callTimeout: 默认值为10000,执行HDFS操作的超时时间(单位为毫秒);
- threadsPoolSize: 默认值为10, 指hdfs sink启动的操作HDFS的线程数;
- rollTimerPoolSize: 默认值为1, hdfs sink启动国的根据时间滚动文件的线程数;
- kerberosPrincipal: HDFS安全认证kerberos配置;
- kerberosKeytab: HDFS安全认证kerberos配置;
- proxyUser: 代理用户;
- round: 默认值为false, 指是否启用时间上的“舍弃”,这里的舍弃类似于“四舍五入”,如果启用,则会影响除了%t的其它所有时间有达式;
- roundValue: 默认值为1, 时间上进行“舍弃”的值;
- roundUnit: 默认值seconds, 时间上进行“舍弃”的单位,包含second, minute, hour,例如:
a1.sinks.k1.hdfs.round = true
a1.sinks.k1.hdfs.roundValue = 10
a1.sinks.k1.hdfs.roundUnit = minute
因为设置的是舍弃10分钟内的时间,因此,该目录每10分钟新生成一个;
- timeZone: 默认值为Local Time时区;
- useLocalTimeStamp: 默认值为false,是否使用当地时间。
- closeTries: 默认值为0,表示hdfs sink关闭文件的尝试次数,当一次关闭失败后,hdfs sink会继续尝试下次关闭,直到成功;如果设置为1,当一次关闭文件失败后,hdfs sink将不会再次尝试关闭文件,这个未关闭的文件将会一直留在那,并且是打开状态;
- retryInterval: 默认值为180秒,hdfs sink尝试关闭文件的时间间隔,如果设置为0,表示不尝试,相当于将hdfs.closeTries设置成1;
- serializer: 默认值为TEXT,指序列化类型,其它的序列化类型还有avro_event或者是所有实现了EventSerializer.Builder的类名;
2.4.启动flume
[itlocals-MacBook-Pro:flume david.tian]$ bin/flume-ng agent -n a1 -c conf/ --conf-file conf/flume2hdfs.conf -Dflume.root.logger=DEBUG,console
3. log4j发日志到flume
源码请从我的git上下载:https://github.com/david-louis-tian/dBD
3.1.这里仅给出pom.xml,模拟日志的代码,和log4j.properties
- pom.xml
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.dvtn.www</groupId>
<artifactId>dBD</artifactId>
<version>1.0-SNAPSHOT</version>
<packaging>jar</packaging>
<name>dBD</name>
<url>http://maven.apache.org</url>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<slf4j.version>1.7.25</slf4j.version>
<log4j.version>1.2.17</log4j.version>
</properties>
<dependencies>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>3.8.1</version>
<scope>test</scope>
</dependency>
<!-- Log Dependency 日志依赖-->
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
<version>${slf4j.version}</version>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
<version>${slf4j.version}</version>
</dependency>
<dependency>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
<version>${log4j.version}</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.json/json -->
<dependency>
<groupId>org.json</groupId>
<artifactId>json</artifactId>
<version>20170516</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.avro/avro -->
<dependency>
<groupId>org.apache.avro</groupId>
<artifactId>avro</artifactId>
<version>1.8.2</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.flume/flume-ng-core -->
<dependency>
<groupId>org.apache.flume</groupId>
<artifactId>flume-ng-core</artifactId>
<version>1.7.0</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.flume.flume-ng-clients/flume-ng-log4jappender -->
<dependency>
<groupId>org.apache.flume.flume-ng-clients</groupId>
<artifactId>flume-ng-log4jappender</artifactId>
<version>1.7.0</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.avro/avro-ipc -->
<dependency>
<groupId>org.apache.avro</groupId>
<artifactId>avro-ipc</artifactId>
<version>1.8.2</version>
</dependency>
</dependencies>
</project>
- log4j.properties
################### set log levels ###############
log4j.rootLogger = INFO,stdout,file,flume
################### flume ########################
log4j.appender.flume = org.apache.flume.clients.log4jappender.Log4jAppender
log4j.appender.flume.layout = org.apache.log4j.PatternLayout
log4j.appender.flume.Hostname = localhost
log4j.appender.flume.Port = 44446
################## stdout #######################
log4j.appender.stdout = org.apache.log4j.ConsoleAppender
log4j.appender.stdout.Threshold = INFO
log4j.appender.stdout.Target = System.out
log4j.appender.stdout.layout = org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern = %d{yyyy-MM-dd HH:mm:ss} %c{1} [%p] %m%n
################## file ##########################
log4j.appender.file = org.apache.log4j.DailyRollingFileAppender
log4j.appender.file.Threshold = INFO
log4j.appender.file.File = /Users/david.tian/logs/tracker/tracker.log
log4j.appender.file.Append = true
log4j.appender.file.DatePattern = '.'yyyy-MM-dd
log4j.appender.file.layout = org.apache.log4j.PatternLayout
log4j.appender.file.layout.ConversionPattern = %d{yyyy-MM-dd HH:mm:ss} %c{1} [%p] %m%n
- SendReceipts.java
package com.dvtn.www.log4j.jsonlog;
import com.dvtn.www.log4j.logfile.LogProducer;
import com.dvtn.www.model.Area;
import org.apache.avro.Schema;
import org.apache.avro.generic.GenericData;
import org.apache.avro.generic.GenericRecord;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.io.*;
import java.util.*;
/**
* Created by david.tian on 08/09/2017.
*/
public class SendReceipts {
private static Logger LOG = LoggerFactory.getLogger(LogProducer.class);
private static String path = SendReceipts.class.getResource("/").getPath();
private static String areaJsonString;
private static String city;
private static String cityKey;
private static String province;
private static String provinceKey;
private static int separator;
private static String phonePrefix;
//private static final Random rnd = new Random();
private static String[] payers = {"Merchants", "Individuals"};
private static String[] managers = {"david", "josen", "fab", "simon", "banana", "tom", "scott", "ekrn", "sunshine", "lily", "kudu", "scala", "spark", "flume", "storm", "kafka", "avro", "linux"};
private static String[] terminalTypes = {"RNM", "CNM", "RNM", "GNM", "CNJ", "GNJ", "RNJ", "GNM", "CNM"};
private static String[] stores = {"连锁店", "分营店", "工厂店", "会员店", "直销店"};
private static String[] items = {"面包","酒","油","牛奶","蔬菜","猪肉","牛肉","羊肉","曲奇","手机","耳机","面粉","大米","糖","苹果","茶叶","书","植物","玩具","床","锅","牙膏","洗衣粉","酱油","金鱼","干货"};
private static String[] itemsType ={"食物","酒水","饮料","日用品","电子","数码","娱乐","家俱"};
public static void main(String[] args) {
Timer timer = new Timer();
timer.schedule(new TimerTask() {
@Override
public void run() {
Random rnd = new Random();
ProduceReceipts pr = new ProduceReceipts();
areaJsonString = pr.readJSON(path, "area.json");
String transactionID = System.currentTimeMillis() + ""+Math.round(Math.random() * 9000 + 1000);
String transactionDate = System.currentTimeMillis() + "";
String taxNumber = Math.round(Math.random() * 9000 + 1000) + " " + Math.round(Math.random() * 9000 + 1000) + " " + Math.round(Math.random() * 9000 + 1000) + " " + Math.round(Math.random() * 9000 + 1000);
String invoiceId = System.currentTimeMillis() + "";
String invoiceNumber = Math.round(Math.random() * 900000000 + 100000000) + "";
String invoiceDate = System.currentTimeMillis() + "";
List<Area> listArea = pr.listArea(areaJsonString);
int idx = rnd.nextInt(listArea.size());
String provinceID = listArea.get(idx).getProvinceID();
String provinceName = listArea.get(idx).getProvinceName();
String cityID = listArea.get(idx).getCityID();
String cityName = listArea.get(idx).getCityName();
String telephone = provinceID + "-" + Math.round(Math.random() * 9000 + 1000) + " " + Math.round(Math.random() * 9000 + 1000);
int managerSize = managers.length;
String manger = managers[rnd.nextInt(managerSize)];
int payerSize = payers.length;
String payer = payers[rnd.nextInt(payerSize)];
String operator = "OP" + Math.round(Math.random() * 90000 + 10000);
int terminalTypeSize = terminalTypes.length;
String terminalNumber = terminalTypes[rnd.nextInt(terminalTypeSize)] + Math.round(Math.random() * 90000 + 10000);
String account = pr.StringReplaceWithStar(Math.round(Math.random() * 9000 + 1000) + " " + Math.round(Math.random() * 9000 + 1000) + " " + Math.round(Math.random() * 9000 + 1000) + " " + Math.round(Math.random() * 9000 + 1000));
String tcNumber = Math.round(Math.random() * 9000 + 1000) + " " + Math.round(Math.random() * 9000 + 1000) + " " + Math.round(Math.random() * 9000 + 1000) + " " + Math.round(Math.random() * 9000 + 1000) + "";
File file = new File(path + "receipts.avsc");
String line = null;
BufferedReader reader = null;
try {
reader = new BufferedReader(new FileReader(file));
while ((line = reader.readLine()) != null) {
// System.out.println("========>" + line);
}
reader.close();
} catch (IOException e) {
e.printStackTrace();
} finally {
if (reader != null) {
try {
reader.close();
} catch (IOException e1) {
}
}
}
try {
//获得整个Schema
Schema schema = new Schema.Parser().parse(new File(path + "receipts.avsc"));
GenericRecord record = new GenericData.Record(schema);
//获取schema中的字段
int storesSize = stores.length;
//获取店面的Schema
Schema.Field storeField = schema.getField("store");
Schema storeSchema = storeField.schema();
GenericRecord storeRecord = new GenericData.Record(storeSchema);
String storeNumber = Math.round(Math.random() * 9000 + 1000) + "";
String address = provinceName + cityName;
String storeName = provinceName + cityName + stores[rnd.nextInt(storesSize)];
storeRecord.put("store_number",storeNumber);
storeRecord.put("store_name",storeName);
storeRecord.put("address",address);
int itemsSize = items.length;
int itemsTypeSize = itemsType.length;
List<GenericRecord> productRecordList = new ArrayList<GenericRecord>();
//获取product的schema
Schema.Field productField = schema.getField("products");
Schema productSchema = productField.schema();
for (int i=0; i< 10; i++){
String itemName = items[rnd.nextInt(1000)%itemsSize];
String itemType = itemsType[rnd.nextInt(1000)%itemsTypeSize];
String quantity = String.valueOf(rnd.nextInt(100));
String price = String.valueOf(rnd.nextFloat()*100);
String discount = String.valueOf(rnd.nextFloat());
GenericRecord productRecord = new GenericData.Record(productSchema);
productRecord.put("item",itemName);
productRecord.put("item_type",itemType);
productRecord.put("quantity",quantity);
productRecord.put("price",price);
productRecord.put("discount",discount);
productRecordList.add(productRecord);
}
record.put("transaction_id",transactionID);
record.put("transaction_date",transactionDate);
record.put("invoice_id",invoiceId);
record.put("invoice_number",invoiceNumber);
record.put("telephone",telephone);
record.put("payer",payer);
record.put("store",storeRecord);
record.put("operator",operator);
record.put("terminal_number",terminalNumber);
record.put("products",productRecordList);
record.put("account",account);
record.put("tc_number",terminalNumber);
LOG.info(record.toString());
} catch (IOException e) {
e.printStackTrace();
}
}
}, 0, 1000);
}
}