大数据之(3)Hadoop环境MapReduce程序验证及hdfs常用命令
程序员文章站
2022-04-30 17:23:30
...
一、MapReduce验证
- 本地创建一个test.txt文件
vim test.txt
输入一些英文句子如下:
Beijing is the capital of China
I love Beijing
I love China
- 上传test.txt到hdfs系统的 ouput目录
- hdfs dfs -mkdir /user
- hdfs dfs -mkdir /user/input
- hdfs dfs -put test.txt /user/input
- 执行MapReduce例子程序
- /hadoop/hadoop-2.9.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.0.jar替换成自己的对应目录。
hadoop jar /hadoop/hadoop-2.9.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.0.jar wordcount /user/input/test.txt output
- 执行命令正常输出如下结果,map先从0-100然后reduce从0-100.
18/11/08 16:18:48 INFO client.RMProxy: Connecting to ResourceManager at master.hadoop/172.16.16.15:8032
18/11/08 16:18:49 INFO input.FileInputFormat: Total input files to process : 1
18/11/08 16:18:49 INFO mapreduce.JobSubmitter: number of splits:1
18/11/08 16:18:49 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
18/11/08 16:18:49 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1541665050140_0001
18/11/08 16:18:49 INFO impl.YarnClientImpl: Submitted application application_1541665050140_0001
18/11/08 16:18:49 INFO mapreduce.Job: The url to track the job: http://master.hadoop:8088/proxy/application_1541665050140_0001/
18/11/08 16:18:49 INFO mapreduce.Job: Running job: job_1541665050140_0001
18/11/08 16:18:55 INFO mapreduce.Job: Job job_1541665050140_0001 running in uber mode : false
18/11/08 16:18:55 INFO mapreduce.Job: map 0% reduce 0%
18/11/08 16:19:00 INFO mapreduce.Job: map 100% reduce 0%
18/11/08 16:19:05 INFO mapreduce.Job: map 100% reduce 100%
18/11/08 16:19:06 INFO mapreduce.Job: Job job_1541665050140_0001 completed successfully
18/11/08 16:19:06 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=106
FILE: Number of bytes written=404235
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=175
HDFS: Number of bytes written=64
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Rack-local map tasks=1
Total time spent by all maps in occupied slots (ms)=1945
Total time spent by all reduces in occupied slots (ms)=1574
Total time spent by all map tasks (ms)=1945
Total time spent by all reduce tasks (ms)=1574
Total vcore-milliseconds taken by all map tasks=1945
Total vcore-milliseconds taken by all reduce tasks=1574
Total megabyte-milliseconds taken by all map tasks=1991680
Total megabyte-milliseconds taken by all reduce tasks=1611776
Map-Reduce Framework
Map input records=3
Map output records=12
Map output bytes=107
Map output materialized bytes=106
Input split bytes=116
Combine input records=12
Combine output records=9
Reduce input groups=9
Reduce shuffle bytes=106
Reduce input records=9
Reduce output records=9
Spilled Records=18
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=88
CPU time spent (ms)=890
Physical memory (bytes) snapshot=513982464
Virtual memory (bytes) snapshot=4242575360
Total committed heap usage (bytes)=316145664
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=59
File Output Format Counters
Bytes Written=64
- 查看结果
hdfs dfs -ls output
Found 2 items
-rw-r--r-- 1 root supergroup 0 2018-11-08 16:19 output1/_SUCCESS
-rw-r--r-- 1 root supergroup 64 2018-11-08 16:19 output1/part-r-00000
hdfs dfs -cat output/part-r-00000
Beijing 2
China 2
I 2
capital 1
is 1
love 2
of 1
the 1
-下载到本地output目录看结果
hdfs dfs -get output output
cat ouput/part-r-00000
输出同样的结果
二、hdfs常用命令
图中显示了很多命令选项信息。以上截图不全,我在表格4-1中完整地列出了支持的命令选项。
选项名称 | 使用格式 | 含义 |
---|---|---|
-ls | -ls <路径> | 查看指定路径的当前目录结构 |
-lsr | -lsr <路径> | 递归查看指定路径的目录结构 |
-du | -du <路径> | 统计目录下个文件大小 |
-dus | -dus <路径> | 汇总统计目录下文件(夹)大小 |
-count | -count [-q] <路径> | 统计文件(夹)数量 |
-mv | -mv <源路径><目的路径> | 移动 |
-cp | -cp <源路径> <目的路径> | 复制 |
-rm | -rm [-skipTrash] <路径> | 删除文件/空白文件夹 |
-rmr | -rmr [-skipTrash] <路径> | 递归删除 |
-put | -put <多个linux上的文件> <hdfs路径> | 上传文件 |
-copyFromLocal | -copyFromLocal <多个linux上的文件> <hdfs路径> | 从本地复制 |
-moveFromLocal | -moveFromLocal <多个linux上的文件> <hdfs路径> | 从本地移动 |
-getmerge | -getmerge <源路径> <linux路径> | 合并到本地 |
-cat | -cat <hdfs路径> | 查看文件内容 |
-text | -text <hdfs路径> | 查看文件内容 |
-copyToLocal | -copyToLocal [-ignoreCrc] [-crc] [hdfs源路径] [linux目的路径] | 从本地复制 |
-moveToLocal | -moveToLocal [-crc] <hdfs源路径> <linux目的路径> | 从本地移动 |
-mkdir | -mkdir <hdfs路径> | 创建空白文件夹 |
-setrep | -setrep [-R] [-w] <副本数> <路径> | 修改副本数量 |
-touchz | -touchz <文件路径> | 创建空白文件 |
-stat | -stat [format] <路径> | 显示文件统计信息 |
-tail | -tail [-f] <文件> | 查看文件尾部信息 |
-chmod | -chmod [-R] <权限模式> [路径] | 修改权限 |
-chown | -chown [-R] [属主][:[属组]] 路径 | 修改属主 |
-chgrp | -chgrp [-R] 属组名称 路径 | 修改属组 |
-help | -help [命令选项] | 帮助 |
注意:以上表格中路径包括hdfs中的路径和linux中的路径。对于容易产生歧义的地方,会特别指出“linux路径”或者“hdfs路径”。如果没有明确指出,意味着是hdfs路径。