欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页  >  数据库

hadoop第一个程序WordCount.java的编译运行过程

程序员文章站 2022-05-14 15:53:09
...

java是hadoop开发的标准官方语言,本文下载了官方的WordCount.java并对其进行了编译和打包,然后使用测试数据运行了该hadoop程序。 这里假定已经装好了hadoop的环境,在Linux下运行hadoop命令能够正常执行; 下载java版本的WordCount.java程序。 将WordCount

java是hadoop开发的标准官方语言,本文下载了官方的WordCount.java并对其进行了编译和打包,然后使用测试数据运行了该hadoop程序。

这里假定已经装好了hadoop的环境,在Linux下运行hadoop命令能够正常执行;

下载java版本的WordCount.java程序。

将WordCount.java复制到linux下的一个目录,这里我复制到/home/crazyant/hadoop_wordcount

[crazyant@dev.mechine hadoop_wordcount]$ ll

total 4

-rwxr–r–? 1 crazyant crazyant 1921 Aug 16 20:03 WordCount.java

在该目录(/home/crazyant/hadoop_wordcount)下创建wordcount_classes目录,用于存放编译WordCount.java生成的class文件。

[crazyant@dev.mechine hadoop_wordcount]$ mkdir wordcount_classes

[crazyant@dev.mechine hadoop_wordcount]$ ll

total 8

drwxrwxr-x? 2 crazyant crazyant 4096 Aug 16 20:07 wordcount_classes

-rwxr–r–? 1 crazyant crazyant 1921 Aug 16 20:03 WordCount.java

编译WordCount.java文件,其中-classpath选项表示要引用hadoop官方的包,-d选项表示要将编译后的class文件生成的目标目录。

[crazyant@dev.mechine hadoop_wordcount]$ javac -classpath /home/crazyant/app/hadoop/hadoop-2-core.jar -d wordcount_classes WordCount.java

[crazyant@dev.mechine hadoop_wordcount]$ ll -R

.:

total 8

drwxrwxr-x? 3 crazyant crazyant 4096 Aug 16 20:09 wordcount_classes

-rwxr–r–? 1 crazyant crazyant 1921 Aug 16 20:03 WordCount.java

./wordcount_classes:

total 4

drwxrwxr-x? 3 crazyant crazyant 4096 Aug 16 20:09 org

./wordcount_classes/org:

total 4

drwxrwxr-x? 2 crazyant crazyant 4096 Aug 16 20:09 myorg

./wordcount_classes/org/myorg:

total 12

-rw-rw-r–? 1 crazyant crazyant 1546 Aug 16 20:09 WordCount.class

-rw-rw-r–? 1 crazyant crazyant 1938 Aug 16 20:09 WordCount$Map.class

-rw-rw-r–? 1 crazyant crazyant 1611 Aug 16 20:09 WordCount$Reduce.class

然后将编译后的class文件打包:

[crazyant@dev.mechine hadoop_wordcount]$ jar -cvf wordcount.jar -C wordcount_classes/ .

added manifest

adding: org/(in = 0) (out= 0)(stored 0%)

adding: org/myorg/(in = 0) (out= 0)(stored 0%)

adding: org/myorg/WordCount$Map.class(in = 1938) (out= 798)(deflated 58%)

adding: org/myorg/WordCount$Reduce.class(in = 1611) (out= 649)(deflated 59%)

adding: org/myorg/WordCount.class(in = 1546) (out= 749)(deflated 51%)

[crazyant@dev.mechine hadoop_wordcount]$ ll

total 12

drwxrwxr-x? 3 crazyant crazyant 4096 Aug 16 20:09 wordcount_classes

-rw-rw-r–? 1 crazyant crazyant 3169 Aug 16 20:11 wordcount.jar

-rwxr–r–? 1 crazyant crazyant 1921 Aug 16 20:03 WordCount.java

在本地用echo生成一个文件,用于输入数据:

[crazyant@dev.mechine hadoop_wordcount]$ echo “hello world, hello crazyant, i am the ant, i am your brother” > inputfile

[crazyant@dev.mechine hadoop_wordcount]$ more inputfile

hello world, hello crazyant, i am the ant, i am your brother

在hadoop上建立一个目录,里面建立输入文件的目录

[crazyant@dev.mechine hadoop_wordcount]$ hadoop fs -mkdir /app/word_count/input

[crazyant@dev.mechine hadoop_wordcount]$ hadoop fs -ls /app/word_count

Found 1 items

drwxr-xr-x?? 3 czt czt????????? 0 2013-08-16 20:16 /app/word_count/input

将本地刚刚写的的inputfile上传到hadoop上的input目录

[crazyant@dev.mechine hadoop_wordcount]$ hadoop fs -put inputfile /app/word_count/input

[crazyant@dev.mechine hadoop_wordcount]$ hadoop fs -ls /app/word_count/input

Found 1 items

-rw-r–r–?? 3 czt czt???????? 61 2013-08-16 20:18 /app/word_count/input/inputfile

运行jar,以建立的Input目录作为输入参数

[crazyant@dev.mechine hadoop_wordcount]$ hadoop jar wordcount.jar org.myorg.WordCount /app/word_count/input /app/word_count/output

13/08/16 20:19:38 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.

13/08/16 20:19:40 INFO util.NativeCodeLoader: Loaded the native-hadoop library

13/08/16 20:19:40 INFO compress.LzoCodec: Successfully loaded & initialized native-lzo library

13/08/16 20:19:40 INFO compress.LzmaCodec: Successfully loaded & initialized native-lzma library

13/08/16 20:19:40 INFO compress.QuickLzCodec: Successfully loaded & initialized native-quicklz library

13/08/16 20:19:40 INFO mapred.FileInputFormat: Total input paths to process : 1

13/08/16 20:19:41 INFO mapred.JobClient: splits size : 61

13/08/16 20:19:41 INFO mapred.JobClient: Running job: job_20130813122541_105844

13/08/16 20:19:43 INFO mapred.JobClient:? map 0% reduce 0%

13/08/16 20:19:57 INFO mapred.JobClient:? map 24% reduce 0%

13/08/16 20:20:07 INFO mapred.JobClient:? map 93% reduce 0%

13/08/16 20:20:16 INFO mapred.JobClient:? map 100% reduce 1%

13/08/16 20:20:26 INFO mapred.JobClient:? map 100% reduce 61%

13/08/16 20:20:36 INFO mapred.JobClient:? map 100% reduce 89%

13/08/16 20:20:47 INFO mapred.JobClient:? map 100% reduce 96%

13/08/16 20:20:57 INFO mapred.JobClient:? map 100% reduce 98%

13/08/16 20:21:00 INFO mapred.JobClient: Updating completed job! Ignoring …

13/08/16 20:21:00 INFO mapred.JobClient: Updating completed job! Ignoring …

13/08/16 20:21:00 INFO mapred.JobClient: Job complete: job_20130813122541_105844

13/08/16 20:21:00 INFO mapred.JobClient: Counters: 19

13/08/16 20:21:00 INFO mapred.JobClient:?? File Systems

13/08/16 20:21:00 INFO mapred.JobClient:???? HDFS bytes read=1951

13/08/16 20:21:00 INFO mapred.JobClient:???? HDFS bytes written=68

13/08/16 20:21:00 INFO mapred.JobClient:???? Local bytes read=5174715

13/08/16 20:21:00 INFO mapred.JobClient:???? Local bytes written=256814

13/08/16 20:21:00 INFO mapred.JobClient:?? Job Counters

13/08/16 20:21:00 INFO mapred.JobClient:???? Launched reduce tasks=100

13/08/16 20:21:00 INFO mapred.JobClient:???? Rack-local map tasks=61

13/08/16 20:21:00 INFO mapred.JobClient:???? ORIGINAL_REDUCES=100

13/08/16 20:21:00 INFO mapred.JobClient:???? Launched map tasks=61

13/08/16 20:21:00 INFO mapred.JobClient:???? MISS_SCHEDULED_REDUCES=15

13/08/16 20:21:00 INFO mapred.JobClient:?? TASK_STATISTICS

13/08/16 20:21:00 INFO mapred.JobClient:???? Total Map Slot Time=34

13/08/16 20:21:00 INFO mapred.JobClient:???? Attempt_0 Map Task Count=61

13/08/16 20:21:00 INFO mapred.JobClient:???? Total Reduce Slot Time=892

13/08/16 20:21:00 INFO mapred.JobClient:?? Map-Reduce Framework

13/08/16 20:21:00 INFO mapred.JobClient:???? Reduce input groups=9

13/08/16 20:21:00 INFO mapred.JobClient:???? Combine output records=0

13/08/16 20:21:00 INFO mapred.JobClient:???? Map input records=1

13/08/16 20:21:00 INFO mapred.JobClient:???? Reduce output records=9

13/08/16 20:21:00 INFO mapred.JobClient:???? Map input bytes=61

13/08/16 20:21:00 INFO mapred.JobClient:???? Combine input records=0

13/08/16 20:21:00 INFO mapred.JobClient:???? Reduce input records=9

查看output目录是否有结果

[crazyant@dev.mechine hadoop_wordcount]$ hadoop fs -ls /app/word_count/output??????????????????????????????????????????????????? Found 100 items

-rw-r–r–?? 3 czt czt????????? 0 2013-08-16 20:20 /app/word_count/output/part-00000

-rw-r–r–?? 3 czt czt????????? 0 2013-08-16 20:20 /app/word_count/output/part-00001

-rw-r–r–?? 3 czt czt????????? 0 2013-08-16 20:20 /app/word_count/output/part-00002

-rw-r–r–?? 3 czt czt????????? 0 2013-08-16 20:20 /app/word_count/output/part-00003

-rw-r–r–?? 3 czt czt????????? 0 2013-08-16 20:20 /app/word_count/output/part-00004

-rw-r–r–?? 3 czt czt????????? 0 2013-08-16 20:20 /app/word_count/output/part-00005

-rw-r–r–?? 3 czt czt????????? 0 2013-08-16 20:20 /app/word_count/output/part-00006

-rw-r–r–?? 3 czt czt????????? 0 2013-08-16 20:20 /app/word_count/output/part-00007

-rw-r–r–?? 3 czt czt????????? 0 2013-08-16 20:20 /app/word_count/output/part-00008

-rw-r–r–?? 3 czt czt????????? 0 2013-08-16 20:20 /app/word_count/output/part-00009

-rw-r–r–?? 3 czt czt????????? 0 2013-08-16 20:20 /app/word_count/output/part-00010

-rw-r–r–?? 3 czt czt????????? 0 2013-08-16 20:20 /app/word_count/output/part-00011

-rw-r–r–?? 3 czt czt????????? 0 2013-08-16 20:20 /app/word_count/output/part-00012

-rw-r–r–?? 3 czt czt????????? 0 2013-08-16 20:20 /app/word_count/output/part-00013

-rw-r–r–?? 3 czt czt????????? 0 2013-08-16 20:20 /app/word_count/output/part-00014

-rw-r–r–?? 3 czt czt????????? 0 2013-08-16 20:20 /app/word_count/output/part-00015

-rw-r–r–?? 3 czt czt????????? 0 2013-08-16 20:20 /app/word_count/output/part-00016

-rw-r–r–?? 3 czt czt????????? 0 2013-08-16 20:20 /app/word_count/output/part-00017

-rw-r–r–?? 3 czt czt????????? 0 2013-08-16 20:20 /app/word_count/output/part-00018

-rw-r–r–?? 3 czt czt????????? 0 2013-08-16 20:20 /app/word_count/output/part-00019

-rw-r–r–?? 3 czt czt????????? 0 2013-08-16 20:20 /app/word_count/output/part-00020

-rw-r–r–?? 3 czt czt????????? 0 2013-08-16 20:20 /app/word_count/output/part-00021

-rw-r–r–?? 3 czt czt????????? 0 2013-08-16 20:20 /app/word_count/output/part-00022

-rw-r–r–?? 3 czt czt????????? 0 2013-08-16 20:20 /app/word_count/output/part-00023

-rw-r–r–?? 3 czt czt????????? 0 2013-08-16 20:20 /app/word_count/output/part-00024

-rw-r–r–?? 3 czt czt????????? 0 2013-08-16 20:20 /app/word_count/output/part-00025

将该目录下所有文本文件合并后下载到本地

[crazyant@dev.mechine hadoop_wordcount]$ hadoop fs -getmerge /app/word_count/output wordcount_result

[crazyant@dev.mechine hadoop_wordcount]$ ls

inputfile? wordcount_classes? wordcount.jar? WordCount.java? wordcount_result

查看一下下载下来的计算结果

[crazyant@dev.mechine hadoop_wordcount]$ more wordcount_result

i?????? 2

your??? 1

crazyant,?????? 1

brother 1

hello?? 2

am????? 2

world,? 1

the???? 1

ant,??? 1

统计结果正确;

参考文章:http://hadoop.apache.org/docs/r0.18.3/mapred_tutorial.html#Example%3A+WordCount+v1.0