Python抓取Discuz!用户名脚本代码

程序员文章站 2022-06-20 12:49:28

最近学习python，于是就用python写了一个抓取discuz!用户名的脚本，代码很少但是很搓。思路很简单，就是正则匹配title然后提取用户名写入文本文档。程序以百度...

最近学习python，于是就用python写了一个抓取discuz!用户名的脚本，代码很少但是很搓。思路很简单，就是正则匹配title然后提取用户名写入文本文档。程序以百度站长社区为例(一共有40多万用户)，挂在vps上就没管了，虽然用了延时但是后来发现一共只抓取了50000多个用户名就被封了。。。
代码如下：

复制代码代码如下:

# -*- coding: utf-8 -*-
# author: 天一
# blog: http://www.90blog.org
# version: 1.0
# 功能: python抓取百度站长平台用户名脚本

import urllib
import urllib2  
import re
import time

def biduspider():
     pattern = re.compile(r'<title>(.*)的个人资料  百度站长社区 </title>')
     uid=1
     thedatas = []
     while uid <400000:
         theurl = "http://bbs.zhanzhang.baidu.com/home.php?mod=space&uid="+str(uid)
         uid +=1
         theresponse  = urllib2.urlopen(theurl)
         thepage = theresponse.read()
         #正则匹配用户名
         thefindall = re.findall(pattern,thepage)
         #等待0.5秒，以防频繁访问被禁止
         time.sleep(0.5)
         if thefindall :
              #中文编码防止乱码输出
              thedatas = thefindall[0].decode('utf-8').encode('gbk')
              #写入txt文本文档
              f = open('theuid.txt','a')
              f.writelines(thedatas+'\n')
              f.close()

if __name__ == '__main__':
     biduspider()

最终成果如下：

Python抓取Discuz!用户名脚本代码

上一篇： Win7系统开机登录框怎么取消？Win7取消开机登录界面的方法

下一篇： Win10系统安装了MacBookPro后没有声音的解决方法

Python抓取Discuz!用户名脚本代码

python正则匹配抓取豆瓣电影链接和评论代码分享

教你做自动发邮件脚本的python代码实例

python将人民币转换大写的脚本代码

Python练习小程序之定时关机小脚本（代码教程）

Python3.6中的简单抓取百度网页源代码

Python之多线程爬虫抓取网页图片的实战代码

zabbix python邮件脚本代码

python备份文件以及mysql数据库的脚本代码

Python脚本实现代码行数统计代码分享

Python抓取手机号归属地信息示例代码