python利用大数据和管道分析操作系统日志
程序员文章站
2022-07-04 09:08:10
...
一 代码
map代码
import os import re import time def Map(sourceFile): if not os.path.exists(sourceFile): print(sourceFile, ' does not exist.') return pattern = re.compile(r'[0-9]{1,2}/[0-9]{1,2}/[0-9]{4}') result = {} with open(sourceFile, 'r') as srcFile: for dataLine in srcFile: r = pattern.findall(dataLine) if r: print(r[0], ',', 1) if __name__ == '__main__': Map('test.txt')
reduce代码
import os import sys def Reduce(targetFile): result = {} for line in sys.stdin: riqi, shuliang = line.strip().split(',') result[riqi] = result.get(riqi, 0)+1 with open(targetFile, 'w') as fp: for k,v in result.items(): fp.write(k + ':' + str(v) + '\n') if __name__ == '__main__': Reduce('result.txt')
二 运行结果
在命令行中运行下面的语句 :
E:\python\python可以这样学\第11章 大数据处理\code>python Hadoop_Map.py test.txt | python Hadoop_Reduce.py
07/10/2013 :4635
07/11/2013 :1
07/16/2013 :51
08/15/2013 :3958
10/09/2013 :733
12/11/2013 :564
02/12/2014 :4102
05/14/2014 :737
上一篇: redis cluster 节点操作
下一篇: [转]帧同步和状态同步