Python实现的大数据分析操作系统日志功能示例
程序员文章站
2022-05-30 22:50:04
本文实例讲述了python实现的大数据分析操作系统日志功能。分享给大家供大家参考,具体如下:
一 代码
1、大文件切分
import os
import o...
本文实例讲述了python实现的大数据分析操作系统日志功能。分享给大家供大家参考,具体如下:
一 代码
1、大文件切分
import os import os.path import time def filesplit(sourcefile, targetfolder): if not os.path.isfile(sourcefile): print(sourcefile, ' does not exist.') return if not os.path.isdir(targetfolder): os.mkdir(targetfolder) tempdata = [] number = 1000 filenum = 1 linesread = 0 with open(sourcefile, 'r') as srcfile: dataline = srcfile.readline().strip() while dataline: for i in range(number): tempdata.append(dataline) dataline = srcfile.readline() if not dataline: break desfile = os.path.join(targetfolder, sourcefile[0:-4] + str(filenum) + '.txt') with open(desfile, 'a+') as f: f.writelines(tempdata) tempdata = [] filenum = filenum + 1 if __name__ == '__main__': #sourcefile = input('input the source file to split:') #targetfolder = input('input the target folder you want to place the split files:') sourcefile = 'test.txt' targetfolder = 'test' filesplit(sourcefile, targetfolder)
2、mapper代码
import os import re import threading import time def map(sourcefile): if not os.path.exists(sourcefile): print(sourcefile, ' does not exist.') return pattern = re.compile(r'[0-9]{1,2}/[0-9]{1,2}/[0-9]{4}') result = {} with open(sourcefile, 'r') as srcfile: for dataline in srcfile: r = pattern.findall(dataline) if r: t = result.get(r[0], 0) t += 1 result[r[0]] = t desfile = sourcefile[0:-4] + '_map.txt' with open(desfile, 'a+') as fp: for k, v in result.items(): fp.write(k + ':' + str(v) + '\n') if __name__ == '__main__': desfolder = 'test' files = os.listdir(desfolder) #如果不使用多线程,可以直接这样写 '''for f in files: map(desfolder + '\\' + f)''' #使用多线程 def main(i): map(desfolder + '\\' + files[i]) filenumber = len(files) for i in range(filenumber): t = threading.thread(target = main, args =(i,)) t.start()
3.reducer代码
import os def reduce(sourcefolder, targetfile): if not os.path.isdir(sourcefolder): print(sourcefolder, ' does not exist.') return result = {} #deal only with the mapped files allfiles = [sourcefolder+'\\'+f for f in os.listdir(sourcefolder) if f.endswith('_map.txt')] for f in allfiles: with open(f, 'r') as fp: for line in fp: line = line.strip() if not line: continue position = line.index(':') key = line[0:position] value = int(line[position + 1:]) result[key] = result.get(key,0) + value with open(targetfile, 'w') as fp: for k,v in result.items(): fp.write(k + ':' + str(v) + '\n') if __name__ == '__main__': reduce('test', 'test\\result.txt')
二 运行结果
依次运行上面3个程序,得到最终结果:
07/10/2013:4634
07/16/2013:51
08/15/2013:3958
07/11/2013:1
10/09/2013:733
12/11/2013:564
02/12/2014:4102
05/14/2014:737
更多关于python相关内容感兴趣的读者可查看本站专题:《python日志操作技巧总结》、《python函数使用技巧总结》、《python字符串操作技巧汇总》、《python入门与进阶经典教程》及《python文件与目录操作技巧汇总》
希望本文所述对大家python程序设计有所帮助。
上一篇: Python爬虫之ip代理池