用Python编写HadoopMR
程序员文章站
2022-04-19 18:42:36
Hadoop 版本:2.7.2
本地测试:
cat input.txt | ./mapper.py
提交Hadoop:
hadoop jar {HADOOP_HOME}/sh...
Hadoop 版本:2.7.2
本地测试:
cat input.txt | ./mapper.py
提交Hadoop:
hadoop jar {HADOOP_HOME}/share/hadoop/tools/lib/hadoop-streaming-2.7.2.jar \ -file map.py -mapper 'python map.py' \ -reducer cat \ -input %s\ -output %s
map.py 基本结构
#!/usr/bin/env python #coding:utf-8 import sys for line in sys.stdin: line = line.strip() ... print ...
说明:map.py 对hdfs输入的数据按行处理后直接输出到hdfs,无reduce 部分