python pipe模块用法
pipe并不是python内置的库,如果你安装了easy_install,直接可以安装它,否则你需要自己下载它:https://pypi.python.org/pypi/pipe
之所以要介绍这个库,是因为它向我们展示了一种很有新意的使用迭代器和生成器的方式:流。pipe将可迭代的数据看成是流,类似于linux,pipe使用'|'传递数据流,并且定义了一系列的“流处理”函数用于接受并处理数据流,并最终再次输出数据流或者是将数据流归纳得到一个结果。我们来看一些例子。
第一个,非常简单的,使用add求和:
[python]- >>> from pipe import *
- >>> range(5) | add
- 10
求偶数和需要使用到where,作用类似于内建函数filter,过滤出符合条件的元素:
[python]- >>> range(5) | where(lambda x: x % 2 == 0) | add
- 6
还记得我们定义的斐波那契数列生成器吗?求出数列中所有小于10000的偶数和需要用到take_while,与itertools的同名函数有类似的功能,截取元素直到条件不成立:
def fibonacci():
[python]
a=b=1
yield a
yield b
while true:
a, b = b, a+b
yield b- >>> fib = fibonacci
- >>> fib() | where(lambda x: x % 2 == 0)
- ... | take_while(lambda x: x < 10000)
- ... | add
- 3382
[python]
需要对元素应用某个函数可以使用select,作用类似于内建函数map;需要得到一个列表,可以使用as_list:- >>> fib() | select(lambda x: x ** 2) | take_while(lambda x: x < 100) | as_list
- [1, 1, 4, 9, 25, 64]
[python]
pipe中还包括了更多的流处理函数。你甚至可以自己定义流处理函数,只需要定义一个生成器函数并加上修饰器pipe。如下定义了一个获取元素直到索引不符合条件的流处理函数:- >>> @pipe
- ... def take_while_idx(iterable, predicate):
- ... for idx, x in enumerate(iterable):
- ... if predicate(idx): yield x
- ... else: return
- ...
使用这个流处理函数获取fib的前10个数字:
[python]- >>> fib() | take_while_idx(lambda x: x < 10) | as_list
- [1, 1, 2, 3, 5, 8, 13, 21, 34, 55]
更多的函数就不在这里介绍了,你可以查看pipe的源文件,总共600行不到的文件其中有300行是文档,文档中包含了大量的示例。pipe实现起来非常简单,使用pipe装饰器,将普通的生成器函数(或者返回迭代器的函数)代理在一个实现了__ror__方法的普通类实例上即可,但是这种思路真的很有趣。
一道面试题:
读取文件,统计文件中每个单词出现的次数,然后按照次数高低排序。
本来蛮平淡无奇的一题,但一跟刚刚介绍的 pipe 结合起来,就有意思了,这类数据流的处理,相当适合用 pipe 来处理,花了点时间,写代码如下:
#coding=utf-8 from re import split from pipe import * with open(r'c:usersadministratordesktop.py') as f: print(f.read() | pipe(lambda x:split('w+', x)) | pipe(lambda x:(i for i in x if i.strip())) | groupby(lambda x:x) | select(lambda x:(x[0], (x[1] | count))) | sort(key=lambda x:x[1], reverse=true) )
输出结果:[('request', 91), ('post', 81), ('and', 38), ('u', 36), ('if', 33), ('in', 32), ('team', 29), ('line', 23), ('objects', 20), ('gcmgroups', 16), ('get', 14), ('import', 14), ('save', 13), ('str', 12), ('0', 11), ('1', 11), ('i', 11), ('false', 10), ('gcwgroups', 9), ('from', 9), ('group_name', 9), ('path', 9), ('team_groups', 9), ('add', 8), ('else', 8), ('extra_context', 8), ('form2', 8), ('return', 8), ('area', 7), ('baoming', 7), ('cname', 7), ('cname1', 7), ('cname2', 7), ('form1', 7), ('mysql_cur', 7), ('8', 6), ('gender', 6), ('is_del', 6), ('time', 6), ('user', 6), ('20', 5), ('7', 5), ('def', 5), ('depth', 5), ('for', 5), ('gcwteam', 5), ('radio1', 5), ('13', 4), ('16', 4), ('2', 4), ('2013', 4), ('5', 4), ('gb2312', 4), ('gcwmember', 4), ('gcwmemberform', 4), ('gcwteam', 4), ('gcwteamform', 4), ('httpresponseredirect', 4), ('age', 4), ('append', 4), ('area1', 4), ('cad_id', 4), ('csv', 4), ('django', 4), ('email', 4), ('encode', 4), ('fax', 4), ('gr_name', 4), ('lines', 4), ('name', 4), ('ob', 4), ('phone', 4), ('qq', 4), ('response', 4), ('status', 4), ('team_user', 4), ('template_name', 4), ('116', 3), ('12', 3), ('4', 3), ('requestcontext', 3), ('true', 3), ('a', 3), ('areas', 3), ('cname3', 3), ('community', 3), ('create', 3), ('csa', 3), ('diyi', 3), ('filter', 3), ('gcmmember', 3), ('gcw', 3), ('hd_cont', 3), ('id', 3), ('list', 3), ('mysql_db', 3), ('pp', 3), ('radio2', 3), ('radio3', 3), ('radio4', 3), ('radio9', 3), ('render_to_response', 3), ('result', 3), ('shiyun', 3), ('sys', 3), ('t_id', 3), ('textfield10', 3), ('textfield11', 3), ('textfield12', 3), ('textfield13', 3), ('textfield14', 3), ('textfield15', 3), ('textfield16', 3), ('textfield5', 3), ('textfield6', 3), ('textfield7', 3), ('textfield8', 3), ('textfield9', 3), ('title', 3), ('topic', 3), ('writers', 3), ('3', 2), ('50', 2), ('from', 2), ('http404', 2), ('httpresponse', 2), ('mysqldb', 2), ('select', 2), ('where', 2), ('all', 2), ('area2', 2), ('area3', 2), ('baoming_user', 2), ('close', 2), ('commit', 2), ('context_instance', 2), ('cut_pages', 2), ('diqu', 2), ('except', 2), ('execute', 2), ('ftp', 2), ('ftp_status', 2), ('gcw_baoming_list', 2), ('gcw_team', 2), ('get_full_area', 2), ('group_community', 2), ('group_farmer', 2), ('group_org', 2), ('group_other', 2), ('group_pupils', 2), ('group_students', 2), ('group_tertiary', 2), ('group_troops', 2), ('is_valid', 2), ('len', 2), ('login_required', 2), ('models', 2), ('not', 2), ('page', 2), ('pk', 2), ('recommend_type', 2), ('resu', 2), ('root', 2), ('select_sql', 2), ('select_sql_mem', 2), ('set_gcw_ftpd', 2), ('st2', 2), ('todo', 2), ('try', 2), ('url', 2), ('username', 2), ('utf', 2), ('10', 1), ('11', 1), ('168', 1), ('17', 1), ('18', 1), ('192', 1), ('210', 1), ('9', 1), ('content', 1), ('disposition', 1), ('e', 1), ('qq', 1), ('arraysize', 1), ('attachment', 1), ('auth', 1), ('baoshaowei', 1), ('break', 1), ('charset', 1), ('cleaned_data', 1), ('coding', 1), ('connect', 1), ('contrib', 1), ('cursor', 1), ('d', 1), ('datetime', 1), ('db', 1), ('decorators', 1), ('en', 1), ('excel', 1), ('extend', 1), ('fetchall', 1), ('fetchmany', 1), ('filename', 1), ('forms', 1), ('ftpd', 1), ('gcw130', 1), ('gcw_baoming', 1), ('gcw_baoming_csv', 1), ('gcw_shipin_status', 1), ('gcwteam_set', 1), ('get_object_or_404', 1), ('hbl', 1), ('hbl_cassi', 1), ('host', 1), ('html', 1), ('http', 1), ('insert', 1), ('int', 1), ('is_captain', 1), ('m_author', 1), ('m_name', 1), ('mail', 1), ('method', 1), ('mimetype', 1), ('order_by', 1), ('pages', 1), ('passwd', 1), ('print', 1), ('raise', 1), ('range', 1), ('re', 1), ('recommend_name', 1), ('reload', 1), ('setdefaultencoding', 1), ('shortcuts', 1), ('team_age', 1), ('team_area', 1), ('team_area_id', 1), ('team_man_num', 1), ('team_name', 1), ('team_num', 1), ('team_woman_num', 1), ('template', 1), ('text', 1), ('textfield21', 1), ('textfield22', 1), ('textfield23', 1), ('textfield24', 1), ('textfield25', 1), ('textfield26', 1), ('textfield61', 1), ('textfield71', 1), ('textfield81', 1), ('topic_gcwmember', 1), ('topic_gcwteam', 1), ('userdb', 1), ('users', 1), ('utf8', 1), ('util', 1), ('views', 1), ('while', 1), ('wohnort3', 1), ('works_long', 1), ('works_name', 1), ('works_type', 1), ('writer', 1), ('writerow', 1), ('writerows', 1)]
上一篇: python爬虫实践之模拟登录
下一篇: Python快速入门(3)列表、练习题