Hadoop学习笔记 2 - MapReduce 简单实例
程序员文章站
2022-03-16 11:04:21
...
1.2 MapReduce开发实例
MapReduce 执行过程,如下图,(先由Mapper进行map计算,将数据进行分组,然后在由Reduce进行结果汇总计算)
直接上代码
package com.itbuilder.hadoop.mr; import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; public class WordCount { public static void main(String[] args) throws Exception { //构建一个JOB对象 Job job = Job.getInstance(new Configuration()); //注意:main方法所在的类 job.setJarByClass(WordCount.class); //设置Mapper相关属性 job.setMapperClass(WCMapper.class); job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(LongWritable.class); FileInputFormat.setInputPaths(job, new Path(args[0])); //设置Reducer相关属性 job.setReducerClass(WCReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(LongWritable.class); FileOutputFormat.setOutputPath(job, new Path(args[1])); //提交任务 job.waitForCompletion(true); } public static class WCMapper extends Mapper<LongWritable, Text, Text, LongWritable> { public WCMapper() { super(); } @Override protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, LongWritable>.Context context) throws IOException, InterruptedException { String line = value.toString(); String words[] = line.split(" "); for (String word : words) { context.write(new Text(word), new LongWritable(1)); } } } public static class WCReducer extends Reducer<Text, LongWritable, Text, LongWritable> { public WCReducer() { super(); } @Override protected void reduce(Text k2, Iterable<LongWritable> v2, Reducer<Text, LongWritable, Text, LongWritable>.Context arg2) throws IOException, InterruptedException { long counter = 0; for (LongWritable count : v2) { counter += count.get(); } arg2.write(k2, new LongWritable(counter)); } } }
需要注意:
WCMapper、WCReducer 作为内部类,必须是静态的内部类
pom.xml 中的jar包依赖
<dependencies>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.11</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-core</artifactId>
<version>2.7.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.7.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>2.7.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-yarn-common</artifactId>
<version>2.7.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-yarn-client</artifactId>
<version>2.7.1</version>
</dependency>
</dependencies>
推荐阅读
-
Python ORM框架SQLAlchemy学习笔记之安装和简单查询实例
-
AngularJS学习笔记(三)数据双向绑定的简单实例
-
pygame学习笔记(2):画点的三种方法和动画实例
-
《Python编程:从入门到实践》个人学习笔记/心得(菜鸟瞎扯淡) Chapter2 变量和简单数据类型
-
ROS学习笔记(3)之ros的topic发送与接收简单实例
-
Hadoop源码学习笔记之NameNode启动流程分析一:源码环境搭建和项目模块及NameNode结构简单介绍
-
Hadoop之MapReduce应用实例2(分组排序)
-
Hadoop学习笔记之初识MapReduce以及WordCount实例分析
-
Hadoop 2.x配置及简单MapReduce案例
-
Web前端基础CSS初识学习笔记(5)简单实例认识 border-radius