python3.4爬虫demo

程序员文章站 2022-06-03 17:13:40

python 3.4 所写爬虫仅仅是个demo，以百度图片首页图片为例。能跑出图片上的图片；使用 eclipse pydev 编写： from spide...

python 3.4 所写爬虫

仅仅是个demo，以百度图片首页图片为例。能跑出图片上的图片；

使用 eclipse pydev 编写：

from spidersimple.htmlhelper import *
import imp
import sys
imp.reload(sys) 
#sys.setdefaultencoding('utf-8')  
html = gethtml('http://image.baidu.com/')
try:
  getimage(html)
  exit()
except exception as e:
  print(e)

htmlhelper.py文件

上面的 spidersimple是自定义的包名

from urllib.request import urlopen,urlretrieve
#正则库
import re
#打开网页
def gethtml(url):
  page = urlopen(url)        
  html = page.read()
  return html
#用正则爬里面的图片地址  
def getimage(html):
  try:
    #reg = r'src="(.+?\.jpg)" class'
    #image = re.compile(reg)  
    image = re.compile(r'<img[^>]*src[=\"\']+([^\"\']*)[\"\'][^>]*>', re.i)     
    html = html.decode('utf-8')
    imaglist = re.findall(image,html)    
    x =0    
    for imagurl in imaglist:  
      #将图片一个个下载到项目所在文件夹     
      urlretrieve(imagurl, '%s.jpg' % x)
      x+=1 
  except exception as e:
    print(e)

要注意个大问题，python 默认编码的问题。

有可能报unicodedecodeerror: 'ascii' codec can't decode byte 0x?? in position 1: ordinal not in range(128)，错误。这个要设置python的默认编码为utf-8.

设置最好的方式是写bat文件，

echo off
set pythonioencoding=utf8
python -u %1

然后重启电脑。

总结

以上就是这篇文章的全部内容了，希望本文的内容对大家的学习或者工作具有一定的参考学习价值，谢谢大家对的支持。如果你想了解更多相关内容请查看下面相关链接

上一篇：解析PHP函数array_flip()在重复数组元素删除中的作用

下一篇：对python中Librosa的mfcc步骤详解

python3.4爬虫demo

Python爬虫实战之12306抢票开源

c# 事件的订阅发布Demo

python制作最美应用的爬虫

python制作花瓣网美女图片爬虫

微信小程序五子棋游戏AI实现方法【附demo源码下载】

微信小程序五子棋游戏的悔棋实现方法【附demo源码下载】

c#爬虫爬取京东的商品信息

Spring AOP入门Demo分享

使用CSS3制作一个简单的进度条(demo)

Oracle Table Demo语句应用介绍