Windows下用myeclipse运行MapReduce程序

程序员文章站 2024-02-02 22:59:46

...

Windows下用myeclipse运行MapReduce程序

虚拟机中要有Hadoop环境

如果没有搭建可以参考我的博客Hadoop搭建

Windows和虚拟机的防火墙都要关闭

hadoop 在启动的时候报下面的错误：

2012-09-18 13:42:38,901 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/192.168.1.240:9000. Already tried 0 time(s).
2012-09-18 13:42:39,904 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/192.168.1.240:9000. Already tried 1 time(s).
CentOS 6 的关闭防火墙的命令

这个可能就是ip地址和主机名有差异，然后就是俩边的防火墙没有关

//临时关闭
service iptables stop
//禁止开机启动
chkconfig iptables off

CentOS 7 的关闭防火墙,因为CentOS7版本后防火墙默认使用firewalld

//临时关闭
systemctl stop firewalld
//禁止开机启动
systemctl disable firewalld
Removed symlink /etc/systemd/system/multi-user.target.wants/firewalld.service.
Removed symlink /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.

Hadoop的core-site.xml的端口号从9000改成8020

<configuration>
        <property>
                <name>fs.defaultFS</name>
                <value>hdfs://master:8020</value>
        </property>
        <property>
                <name>hadoop.tmp.dir</name>
                <value>file:/usr/local/hadoop/tmp</value>
                <description>Abase for other temporary directories.</description>
        </property>
</configuration>

Wins下的一些配置

关闭Wins的防火墙

Windows下用myeclipse运行MapReduce程序

将hadoop相关版本的hadoop解压到你的磁盘下
- ==路径中不要带有空格或者中文字符==
准备hadoop.dll、winutils.exe这俩个文件夹
将hadoop.dll、winutils.exe放到win下hadoop的bin目录下，hadoop.dll放到C:\Windows\System32目录下。我的Hadoop的版本是2.6的，如果你也是可以去我的CSDN下载
在Wins下要配置Hadoop的环境变量

HADOOP_HOME:
D:\D\hadoop-2.6.0
PATH追加：
%HADOOP_HOME%\bin;%HADOOP_HOME%\sbin;%HADOOP_HOME%\bin\winntils.exe

在windows下C:\Windows\System32\drivers\etc\hosts文件中配置映射，追加linux主机的ip

    192.168.1.10    master

修改系统管理员名字

Windows下用myeclipse运行MapReduce程序

最后，把电脑进行”注销”或者”重启电脑”，这样才能使管理员才能用这个名字。

添加插件hadoop-eclipse-plugin-2.6.0到D:\Myeclipse\MyEclipse Professional 2014\plugins我的CSDN下载，然后重新启动Eclipse即可生效。

现在我们就开始启动myeclipse or eclipse

插了插件之后myeclipse的右上角会有这样的标识

Windows下用myeclipse运行MapReduce程序

添加在Windows下安装的hadoop路径

Windows下用myeclipse运行MapReduce程序

开启MapRduce

Windows下用myeclipse运行MapReduce程序

注意上图中的红色标注的地方，是需要我们关注的地方。

Location Name：可以任意其，标识一个"Map/Reduce Location"
Map/Reduce Master
Host：192.168.1.2（Master.Hadoop的IP地址）
Port：9001

DFS Master 
Use M/R Master host：前面的勾上。（因为我们的NameNode和JobTracker都在一个机器上。）
Port：9000

User name：hadoop（默认为Win系统管理员名字，因为我们之前改了所以这里就变成了hadoop。）

下面是一个WordCount的Demo

将虚拟机中配置好的core-site.xml、hdfs-site.xml和log4j.properties拷到Wins下的项目中
这是hadoop自带的程序，所以程序的包名和类名是org.apache.hadoop.examples、WordCount
在启动之前我们要在hdfs中创建input文件夹，output文件夹是自动生成的

[hadoop@master hadoop]$ hadoop fs -mkdir -p /user/hadoop/input

上传一些文件到input文件夹

[aaa@qq.com hadoop]$ hadoop fs -put NOTICE.txt README.txt LICENSE.txt /user/hadoop/input

下面是WordCount的程序代码

package org.apache.hadoop.examples;

import java.io.IOException;
import java.util.Iterator;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

public class WordCount {
    public WordCount() {
    }

    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        String[] otherArgs = (new GenericOptionsParser(conf, args)).getRemainingArgs();
        if(otherArgs.length < 2) {
            System.err.println("Usage: wordcount <in> [<in>...] <out>");
            System.exit(2);
        }

        Job job = Job.getInstance(conf, "word count");
        job.setJarByClass(WordCount.class);
        job.setMapperClass(WordCount.TokenizerMapper.class);
        job.setCombinerClass(WordCount.IntSumReducer.class);
        job.setReducerClass(WordCount.IntSumReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);

        for(int i = 0; i < otherArgs.length - 1; ++i) {
            FileInputFormat.addInputPath(job, new Path(otherArgs[i]));
        }

        FileOutputFormat.setOutputPath(job, new Path(otherArgs[otherArgs.length - 1]));
        System.exit(job.waitForCompletion(true)?0:1);
    }

    public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
        private IntWritable result = new IntWritable();

        public IntSumReducer() {
        }

        public void reduce(Text key, Iterable<IntWritable> values, Reducer<Text, IntWritable, Text, IntWritable>.Context context) throws IOException, InterruptedException {
            int sum = 0;

            IntWritable val;
            for(Iterator i$ = values.iterator(); i$.hasNext(); sum += val.get()) {
                val = (IntWritable)i$.next();
            }

            this.result.set(sum);
            context.write(key, this.result);
        }
    }

    public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> {
        private static final IntWritable one = new IntWritable(1);
        private Text word = new Text();

        public TokenizerMapper() {
        }

        public void map(Object key, Text value, Mapper<Object, Text, Text, IntWritable>.Context context) throws IOException, InterruptedException {
            StringTokenizer itr = new StringTokenizer(value.toString());

            while(itr.hasMoreTokens()) {
                this.word.set(itr.nextToken());
                context.write(this.word, one);
            }

        }
    }
}

右键点击刚创建的 WordCount.java，选择 Run As -> Run Configurations，在此处可以设置运行时的相关参数（如果 Java Application 下面没有 WordCount，那么需要先双击 Java Application）。切换到 “Arguments” 栏，在 Program arguments 处填写 “input output” 就可以了。

Windows下用myeclipse运行MapReduce程序
* 或者也可以直接在代码中设置好输入参数。可将代码 main() 函数的 String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs(); 改为：
String[] otherArgs=new String[]{“input”,”output”}; /* 直接设置输入参数 */
设定参数后，再次运行程序，可以看到运行成功的提示，刷新 DFS Location 后也能看到输出的 output 文件夹。

然后就可以在左边的DFS Location中看到自动生成的output文件夹了，mapreduce之后的内容在part-r-00000中

一些小问题问题

1、在使用win下面的辅助工具（eg：Xshell、）创建的SecureCRT创建的hdfs的input路径的时，该文件夹不会有执行 X 的权限。给文件夹加上权限

chmod -R 777 /

2、你在配置wins的hadoop环境变量的时候一般配好了就会自动生成，要是不行的话可以试一试重启电脑。
3、java.lang.IllegalArgumentException: java.net.UnknownHostException: user 错误解决;是因为没有在在windows下C:\Windows\System32\drivers\etc\hosts文件中配置映射，追加linux主机的ip

    192.168.1.10    master

Windows下用myeclipse运行MapReduce程序