大数据Hadoop之MR Combiner案例实操

程序员文章站 2022-04-28 16:30:34

...

1．需求

统计过程中对每一个MapTask的输出进行局部汇总，以减小网络传输量即采用Combiner功能。

（1）数据输入

atguigu atguigu
ss ss
cls cls
jiao
banzhang
xue
hadoop

（2）期望输出数据

期望：Combine输入数据多，输出时经过合并，输出数据降低。

2．需求分析（我们采用方案一）

大数据Hadoop之MR Combiner案例实操

3．案例实操

Combiner

package com.mapreduce.wordcount;

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class WCCombiner extends Reducer<Text, IntWritable, Text, IntWritable>{
	
	IntWritable v = new IntWritable();
	Text k = new Text();
	@Override
	protected void reduce(Text k, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
		// 设置一个变量用来统计总数
		int sum = 0;
		
		// 遍历
		for (IntWritable intWritable : values) {
			sum += intWritable.get();
		}
		
		// 写出
		v.set(sum);
		context.write(k, v);
	}
}

Mapper

package com.mapreduce.wordcount;

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class WCMapper extends Mapper<LongWritable, Text, Text, IntWritable>{
	
	Text k = new Text();
	IntWritable v = new IntWritable(1);
	@Override
	protected void map(LongWritable key, Text value, Context context)
			throws IOException, InterruptedException {
		// 1. 获取行数据
		String line = value.toString();
		
		// 2. 切分数据
		String[] words = line.split(" ");
		
		// 3. 封装对象
		// 4. 循环写出
		for (String word : words) {
			k.set(word);
			context.write(k, v);
		}
	}
}

Reducer

package com.mapreduce.wordcount;

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class WCReducer extends Reducer<Text, IntWritable, Text, IntWritable>{
	
	@Override
	protected void reduce(Text k, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
		// 1. 定义个变量记录累加的和
		int sum = 0;
		
		// 2. 遍历values
		for (IntWritable intWritable : values) {
			sum += intWritable.get();
		}
		
		// 3. 写出
		context.write(k, new IntWritable(sum));
	}
}

Driver

package com.mapreduce.wordcount;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;


public class WCDriver {
	public static void main(String[] args) throws Exception {
		// 初始化args
		args = new String[] { "D:\\hadoop-2.7.1\\winMR\\WordCount\\input",
				"D:\\hadoop-2.7.1\\winMR\\WordCount\\output2" };

		// 1. 获取job实例
		Configuration conf = new Configuration();
		Job job = Job.getInstance(conf);

		// 2. 设置jar
		job.setJarByClass(WCDriver.class);

		// 3. 关联map和reduce
		job.setMapperClass(WCMapper.class);
		job.setReducerClass(WCReducer.class);

		// 4. 设置map输出的kv类型
		job.setMapOutputKeyClass(Text.class);
		job.setMapOutputValueClass(IntWritable.class);

		// 5. 设置最终输出的kv类型
		job.setOutputKeyClass(Text.class);
		job.setOutputValueClass(IntWritable.class);

		// 8. 设置Combiner
		job.setCombinerClass(WCCombiner.class);
		
		// 6. 设置输入输出路径
		FileInputFormat.setInputPaths(job, new Path(args[0]));
		FileOutputFormat.setOutputPath(job, new Path(args[1]));

		// 7. 提交job
		job.waitForCompletion(true);
	}
}

4. 输出结果

大数据Hadoop之MR Combiner案例实操
Combiner的input和output不再是0，你可以与没设置Combiner的WordCount案例比较一下，但是结果都是相同的。

上一篇： python缩进是强制吗

下一篇： Apache的Rewrite和404错误页面

大数据Hadoop之MR Combiner案例实操

1．需求

2．需求分析（我们采用方案一）

3．案例实操

4. 输出结果

大数据Hadoop之MR自定义排序全排序案例实操

大数据Hadoop之MR Partition分区案例

大数据Hadoop之MR自定义排序区内排序案例实操

大数据Hadoop之MR Combiner案例实操

大数据Hadoop之MR GroupingComparator辅助排序案例实操

大数据Hadoop之Hadoop序列化案例实操

大数据Hadoop之MR TopN案例

大数据Hadoop之MR Combiner案例实操

1．需求

2．需求分析（我们采用方案一）

3．案例实操

4. 输出结果

大数据Hadoop之MR自定义排序 全排序案例实操

大数据Hadoop之MR Partition分区案例

大数据Hadoop之MR自定义排序 区内排序案例实操

大数据Hadoop之MR Combiner案例实操

大数据Hadoop之MR GroupingComparator辅助排序案例实操

大数据Hadoop之Hadoop序列化案例实操

大数据Hadoop之MR TopN案例

大数据Hadoop之MR自定义排序全排序案例实操

大数据Hadoop之MR自定义排序区内排序案例实操