欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

python网络爬虫二:Requests库网络爬虫实战项目

程序员文章站 2022-07-14 11:03:17
...

Requests库网络爬虫实战

1京东商品页面爬取

目标页面地址:https://item.jd.com/5089267.htmlpython网络爬虫二:Requests库网络爬虫实战项目

import requests
url = 'https://item.jd.com/5089267.html'
try:
    r = requests.get(url)
    r.raise_for_status()
    r.encoding =r.apparent_encoding
    print(r.text[:1000])
except:
    print("爬取失败")

2 当当网商品页面爬取

目标页面地址:http://product.dangdang.com/26487763.html
python网络爬虫二:Requests库网络爬虫实战项目

import requests
url = 'http://product.dangdang.com/26487763.html'
try:
    r = requests.get(url)
    r.raise_for_status()
    r.encoding =r.apparent_encoding
    print(r.text[:1000])
except IOError as e:
    print(str(e))

出现报错:
HTTPConnectionPool(host=‘127.0.0.1’, port=80): Max retries exceeded with url: /26487763.html (Caused by NewConnectionError(’<urllib3.connection.HTTPConnection object at 0x10fc390>: Failed to establish a new connection: [Errno 111] Connection refused’,))

报错原因:当当网拒绝不合理的浏览器访问。
查看初识的http请求头:
print(r.request.headers)python网络爬虫二:Requests库网络爬虫实战项目
代码改进:构造合理的HTTP请求头

import requests
url = 'http://product.dangdang.com/26487763.html'
try:
    kv = {'user-agent':'Mozilla/5.0'}
    r = requests.get(url,headers=kv)
    r.raise_for_status()
    r.encoding =r.apparent_encoding
    print(r.text[:1000])
except IOError as e:
    print(str(e))

结果正常爬取:python网络爬虫二:Requests库网络爬虫实战项目

3 百度360搜索引擎关键词提交

百度关键词接口:http://www.baidu.com/s?wd=keyword
代码实现:

import requests
keyword = "python"
try:
    kv = {'wd':keyword}
    r = requests.get("http://www.baidu.com/s",params=kv)
    print(r.request.url)
    r.raise_for_status()
    print(len(r.text))
except IOError as e:
    print(str(e))

执行结果:
python网络爬虫二:Requests库网络爬虫实战项目
360关键词接口:http://www.so.com/s?q=keyword
代码实现:

import requests
keyword = "Linux"
try:
    kv = {'q':keyword}
    r = requests.get("http://www.so.com/s",params=kv)
    print(r.request.url)
    r.raise_for_status()
    print(len(r.text))
except IOError as e:
    print(str(e))

4 网络图片爬取和存储

网络图片链接的格式:
http://FQDN/picture.jpg
校花网:http://www.xiaohuar.com
选择一个图片地址:http://www.xiaohuar.com/d/file/20141116030511162.jpg
实现代码:

import requests
import os
url = "http://www.xiaohuar.com/d/file/20141116030511162.jpg"
dir = "D://pics//"
path = dir + url.split('/')[-1] #设置图片保存路径并以原图名名字命名
try:
    if not os.path.exists(dir):
        os.mkdir(dir)
    if not os.path.exists(path):
        r = requests.get(url)
        with open(path,'wb') as f:
            f.write(r.content)
            f.close()
            print("文件保存成功")
    else:
        print("文件已存在")
except IOError as e:
    print(str(e))

查看图片已经存在:
python网络爬虫二:Requests库网络爬虫实战项目

5 ip地址归属地查询

ip地址归属地查询网站接口:http://www.ip138.com/ips138.asp?ip=
实现代码:

import requests
url = "http://www.ip38.com/ip.php?ip="
try:
    r = requests.get(url+'104.193.88.77')
    r.raise_for_status()
    r.encoding = r.apparent_encoding
    print(r.text)
except IOError as e:
    print(str(e))

6 有道翻译翻译表单提交

打开有道翻译,在开发者模式依次单击“Network”按钮和“XHR”按钮,找到翻译数据:python网络爬虫二:Requests库网络爬虫实战项目

import requests
import json

def get_translate_date(word=None):
    url = "http://fanyi.youdao.com/translate?smartresult=dict&smartresult=rule"
    #post参数需要放在请求实体里,构建一个新字典
    form_data = {'i': word,
                 'from': 'AUTO',
                 'to': 'AUTO',
                 'smartresult': 'dict',
                 'client': 'fanyideskweb',
                 'salt': '15569272902260',
                 'sign': 'b2781ea3e179798436b2afb674ebd223',
                 'ts': '1556927290226',
                 'bv': '94d71a52069585850d26a662e1bcef22',
                 'doctype': 'json',
                 'version': '2.1',
                 'keyfrom': 'fanyi.web',
                 'action': 'FY_BY_REALTlME'
                 }
    #请求表单数据
    response = requests.post(url,data=form_data)
    #将JSON格式字符串转字典
    content = json.loads(response.text)
    #打印翻译后的数据
    print(content['translateResult'][0][0]['tgt'])

if __name__ == '__main__':
    word = input("请输入你要翻译的文字:")
    get_translate_date(word)

执行结果:
python网络爬虫二:Requests库网络爬虫实战项目

相关标签: python