欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

第一个python实现的mapreduce程序

程序员文章站 2022-07-14 20:31:07
...

map:

# !/usr/bin/env python

import sys

for line in sys.stdin:
    line = line.strip()
    words = line.split()

    for word in words:
    print ("%s\t%s") % (word, 1)

reduce:

#!/usr/bin/env python
import operator
import sys
current_word = None
curent_count = 0
word = None
for line in sys.stdin:
    line = line.strip()
    word, count = line.split('\t', 1)
    try:
        count = int(count)
    except ValueError:
            continue
    if current_word == word:
        curent_count += count
    else:
        if current_word:
            print '%s\t%s' % (current_word,curent_count)
        current_word=word
        curent_count=count

if current_word==word:
    print '%s\t%s' % (current_word,curent_count)

测试:

[aaa@qq.com input]# echo "foo foo quux labs foo bar zoo zoo hying" | /home/hadoop/input/max_map.py | sort | /home/hadoop/input/max_reduce.py

第一个python实现的mapreduce程序

执行:可将其写入脚本文件

 //注意\-file之间一定不能空格
hadoop jar /hadoop64/hadoop-2.7.1/share/hadoop/tools/lib/hadoop-*streaming*.jar -D stream.non.zero.exit.is.failure=false \-file /home/hadoop/input/max_map.py -mapper /home/hadoop/input/max_map.py \-file /home/hadoop/input/max_reduce.py  -reducer /home/hadoop/input/max_reduce.py \-input /input/temperature/ -output /output/temperature

第一个python实现的mapreduce程序

相关标签: mapreduce python