第一个python实现的mapreduce程序

程序员文章站 2022-07-14 20:31:07

...

map:

# !/usr/bin/env python

import sys

for line in sys.stdin:
    line = line.strip()
    words = line.split()

    for word in words:
    print ("%s\t%s") % (word, 1)

reduce:

#!/usr/bin/env python
import operator
import sys
current_word = None
curent_count = 0
word = None
for line in sys.stdin:
    line = line.strip()
    word, count = line.split('\t', 1)
    try:
        count = int(count)
    except ValueError:
            continue
    if current_word == word:
        curent_count += count
    else:
        if current_word:
            print '%s\t%s' % (current_word,curent_count)
        current_word=word
        curent_count=count

if current_word==word:
    print '%s\t%s' % (current_word,curent_count)

测试：

[aaa@qq.com input]# echo "foo foo quux labs foo bar zoo zoo hying" | /home/hadoop/input/max_map.py | sort | /home/hadoop/input/max_reduce.py

第一个python实现的mapreduce程序

执行：可将其写入脚本文件

 //注意\-file之间一定不能空格
hadoop jar /hadoop64/hadoop-2.7.1/share/hadoop/tools/lib/hadoop-*streaming*.jar -D stream.non.zero.exit.is.failure=false \-file /home/hadoop/input/max_map.py -mapper /home/hadoop/input/max_map.py \-file /home/hadoop/input/max_reduce.py  -reducer /home/hadoop/input/max_reduce.py \-input /input/temperature/ -output /output/temperature

第一个python实现的mapreduce程序

第一个python实现的mapreduce程序

Python面向对象程序设计中类的定义、实例化、封装及私有变量/方法详解

使用Python3+PyQT5+Pyserial 实现简单的串口工具方法

python matplotlib实现双Y轴的实例

python实现H2O中的随机森林算法介绍及其项目实战

Python 25行代码实现的RSA算法详解

python实现文件的分割与合并

python3下实现搜狗AI API的代码示例

python实现用户登陆邮件通知的方法

Python基于pycrypto实现的AES加密和解密算法示例

python 的购物小程序

第一个python实现的mapreduce程序

Python面向对象程序设计中类的定义、实例化、封装及私有变量/方法详解

使用Python3+PyQT5+Pyserial 实现简单的串口工具方法

python matplotlib实现双Y轴的实例

python实现H2O中的随机森林算法介绍及其项目实战

Python 25行代码实现的RSA算法详解

python实现文件的分割与合并

python3下实现搜狗AI API的代码示例

python实现用户登陆邮件通知的方法

Python基于pycrypto实现的AES加密和解密算法示例

python 的 购物小程序

python 的购物小程序