Mac下Eclipse提交任务到Hadoop集群

程序员文章站 2024-02-25 10:44:28

...

搭建Hadoop集群: VirtualBox+Ubuntu 14.04+Hadoop2.6.0

搭建好集群后, 在Mac下安装Eclipse并连接Hadoop集群

1. 访问集群

1.1. 修改Mac的hosts

添加Master的IP到Mac的hosts

##
# Host Database
#
# localhost is used to configure the loopback interface
# when the system is booting.  Do not change this entry.
##
127.0.0.1    localhost
255.255.255.255    broadcasthost
::1             localhost

192.168.56.101  Master # 添加Master的IP

1.2 访问集群

Master下, 启动集群
Mac下, 打开http://master:50070/
能够成功访问, 看到集群的信息, 就可以了

2. 下载安装Eclipse

Eclipse IDE for Java Developers

http://www.eclipse.org/downloads/package...

3. 配置Eclipse

3.1 配置Hadoop-Eclipse-Plugin

3.1.1 下载Hadoop-Eclipse-Plugin

可下载 Github 上的 hadoop2x-eclipse-plugin（备用下载地址：http://pan.baidu.com/s/1i4ikIoP）

3.1.2 安装Hadoop-Eclipse-Plugin

在Applications中找个Eclise, 右键, Show Package Contents

Mac下Eclipse提交任务到Hadoop集群

将插件复制到plugins目录下, 然后重新打开Eclipse就可以了

Mac下Eclipse提交任务到Hadoop集群

3.2 连接Hadoop集群

3.2.1 配置Hadoop安装目录

将Hadoop安装包解压到任何目录, 不用做任何配置, 然后在Eclipse中指向该目录即可

Mac下Eclipse提交任务到Hadoop集群

3.2.2 配置集群地址

点击右上角的加号

Mac下Eclipse提交任务到Hadoop集群

添加Map/Reduce视图

Mac下Eclipse提交任务到Hadoop集群

选择Map/Reduce Locations, 然后右键, 选择New Hadoop location

Mac下Eclipse提交任务到Hadoop集群

需要改Location name, Host, DFS Master下的Port, User name ( Master会引用Mac中的hosts配置的IP ), 完成后, Finish

Mac下Eclipse提交任务到Hadoop集群

3.2.3 查看HDFS

查看是否可以直接访问HDFS

Mac下Eclipse提交任务到Hadoop集群

4. 集群中运行WordCount

4.1 创建项目

File -> New -> Other -> Map/Reduce Project

输入项目名: WordCount, 然后点击, Finish

4.2 创建类

创建一个类, 报名org.apache.hadoop.examples, 类名: WordCount

4.3 WordCount代码

复制下面的代码到WordCount.java中

package org.apache.hadoop.examples;
 
import java.io.IOException;
import java.util.StringTokenizer;
 
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
 
public class WordCount {
 
  public static class TokenizerMapper 
       extends Mapper<Object, Text, Text, IntWritable>{
 
    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();
 
    public void map(Object key, Text value, Context context
                    ) throws IOException, InterruptedException {
      StringTokenizer itr = new StringTokenizer(value.toString());
      while (itr.hasMoreTokens()) {
        word.set(itr.nextToken());
        context.write(word, one);
      }
    }
  }
 
  public static class IntSumReducer 
       extends Reducer<Text,IntWritable,Text,IntWritable> {
    private IntWritable result = new IntWritable();
 
    public void reduce(Text key, Iterable<IntWritable> values, 
                       Context context
                       ) throws IOException, InterruptedException {
      int sum = 0;
      for (IntWritable val : values) {
        sum += val.get();
      }
      result.set(sum);
      context.write(key, result);
    }
  }
 
  public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();
    String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
    if (otherArgs.length != 2) {
      System.err.println("Usage: wordcount <in> <out>");
      System.exit(2);
    }
    Job job = new Job(conf, "word count");
    job.setJarByClass(WordCount.class);
    job.setMapperClass(TokenizerMapper.class);
    job.setCombinerClass(IntSumReducer.class);
    job.setReducerClass(IntSumReducer.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);
    FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
    FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
    System.exit(job.waitForCompletion(true) ? 0 : 1);
  }
}

4.4 配置Hadoop参数

将所有修改过的配置文件和log4j.properties, 复制到src目标下

这里我复制了slaves, core-site.xml, hdfs-site.xml, mapred-site.xml, yarn-site.xml

4.4 配置HDFS输入输出路径

鼠标移动到WordCount.java上, 右键, Run As, Java Application

Mac下Eclipse提交任务到Hadoop集群

此时, 程序不会正常运行. 再次右键, Run As, 选择Run Configurations

填入输入输出路径 (空格分割)

Mac下Eclipse提交任务到Hadoop集群

配置完成后点击, Run. 此时会出现, Permission denied

5. 运行中出现的问题

5.1 Permission denied

没有权限访问HDFS

# 假设Mac的用户名为hadoop
groupadd supergroup # 添加supergroup组
useradd -g supergroup hadoop # 添加hadoop用户到supergroup组

# 修改hadoop集群中hdfs文件的组权限, 使属于supergroup组的所有用户都有读写权限
hadoop fs -chmod 777 /

6. 查看Hadoop源码

6.1 下载源码

http://apache.claz.org/hadoop/common/had...

6.2 链接源码

右上角的搜索框中, 搜索Open Type

Mac下Eclipse提交任务到Hadoop集群

输入NameNode, 选择NameNode, 发现看不了源码

点击Attach Source -> External location -> External Floder

Mac下Eclipse提交任务到Hadoop集群

参考资料

使用Eclipse编译运行MapReduce程序 Hadoop2.6.0_Ubuntu/CentOS

上一篇：安装使用Percona XtraBackup来备份恢复MySQL的教程

下一篇： Python实现破解猜数游戏算法示例

Mac下Eclipse提交任务到Hadoop集群

1. 访问集群

1.1. 修改Mac的hosts

1.2 访问集群

2. 下载安装Eclipse

3. 配置Eclipse

3.1 配置Hadoop-Eclipse-Plugin

3.1.1 下载Hadoop-Eclipse-Plugin

3.1.2 安装Hadoop-Eclipse-Plugin

3.2 连接Hadoop集群

3.2.1 配置Hadoop安装目录

3.2.2 配置集群地址

3.2.3 查看HDFS

4. 集群中运行WordCount

4.1 创建项目

4.2 创建类

4.3 WordCount代码

4.4 配置Hadoop参数

4.4 配置HDFS输入输出路径

5. 运行中出现的问题

5.1 Permission denied

6. 查看Hadoop源码

6.1 下载源码

6.2 链接源码

参考资料

windows下idea中搭建hadoop开发环境，向远程hadoop集群提交mapreduce任务

Mac下Eclipse提交任务到Hadoop集群

eclipse提交mr到hadoop集群后任务卡住不执行

eclipse提交mr到hadoop集群后任务卡住不执行