欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

python利用大数据和管道分析操作系统日志

程序员文章站 2022-07-04 09:08:10
...

一 代码

map代码
import os
import re
import time
def Map(sourceFile):
    if not os.path.exists(sourceFile):
        print(sourceFile, ' does not exist.')
        return    
    pattern = re.compile(r'[0-9]{1,2}/[0-9]{1,2}/[0-9]{4}')
    result = {}
    with open(sourceFile, 'r') as srcFile:
        for dataLine in srcFile:
            r = pattern.findall(dataLine)
            if r:
                print(r[0], ',', 1)
if __name__ == '__main__':
    Map('test.txt') 
 
reduce代码
import os
import sys
def Reduce(targetFile):
    result = {}
    for line in sys.stdin:
        riqi, shuliang = line.strip().split(',')
        result[riqi] = result.get(riqi, 0)+1
    with open(targetFile, 'w') as fp:
        for k,v in result.items():
            fp.write(k + ':' + str(v) + '\n')
if __name__ == '__main__':
    Reduce('result.txt')
 
二 运行结果
在命令行中运行下面的语句 :
E:\python\python可以这样学\第11章 大数据处理\code>python Hadoop_Map.py test.txt | python Hadoop_Reduce.py
07/10/2013 :4635
07/11/2013 :1
07/16/2013 :51
08/15/2013 :3958
10/09/2013 :733
12/11/2013 :564
02/12/2014 :4102
05/14/2014 :737