Python-爬虫小计

程序员文章站 2022-04-14 17:04:04

1 # -*-coding:utf8-*- 2 import requests 3 from bs4 import BeautifulSoup 4 import time 5 import os 6 import urllib 7 import re 8 import json 9 10 11 re... ......

 1 # -*-coding:utf8-*-
 2 import requests
 3 from bs4 import beautifulsoup
 4 import time
 5 import os
 6 import urllib
 7 import re
 8 import json
 9 
10 
11 requests.packages.urllib3.disable_warnings()
12 
13 proxies = {"http": "http:.............................",
14            "https": "https:............................."}
15 headers = {
16     'user-agent': 'mozilla/5.0 (windows nt 6.1) applewebkit/537.36 (khtml, like gecko) chrome/59.0.3071.115 safari/537.36'
17 }
18 
19 def get_bs(url):
20     res = requests.get(url, proxies=proxies, headers=headers, verify=false)
21     bs = beautifulsoup(res.content, 'lxml')
22     return bs
23 
24 def get_first_url():
25     first_url_list = []
26     page = 213
27     for i in range(page):
28         root_url =  "https://www.model61.com/mold.php?page={}".format(str(i+1))
29         bs = get_bs(root_url)
30         for i in  bs.select("dt a"):
31             src = i.get('href')
32             if "php" in src:
33                 first_url = "https://www.model61.com/{}".format(src)
34                 first_url_list.append(first_url)
35     return first_url_list
36 
37 
38 
39 if __name__ == '__main__':
40     get_first_url()

上一篇： redux

下一篇：关于c++11中的thread库

Python-爬虫小计

Python构建网页爬虫原理分析

python爬虫学习---爬取微软必应翻译（中英互译）

Python爬虫之Selenium实现窗口截图

python_爬虫_通过selenium获取人人网cookie值并模拟登陆个人界面

Python爬虫爬取淘宝，京东商品信息

使用selenium框架的Python爬虫被检测到的解决方法

python爬虫微博爬取以及分析

python爬虫分布式获取数据的实例方法

Nginx限制搜索引擎爬虫频率、禁止屏蔽网络爬虫配置示例

基于nodejs 的多页面爬虫实例代码

Python-爬虫小计

Python构建网页爬虫原理分析

python爬虫学习---爬取微软必应翻译（中英互译）

Python爬虫之Selenium实现窗口截图

python_爬虫_通过selenium获取人人网cookie值并模拟登陆个人界面

Python爬虫爬取淘宝，京东商品信息

使用selenium框架的Python爬虫被检测到的 解决方法

python爬虫微博爬取以及分析

python爬虫分布式获取数据的实例方法

Nginx限制搜索引擎爬虫频率、禁止屏蔽网络爬虫配置示例

基于nodejs 的多页面爬虫实例代码

使用selenium框架的Python爬虫被检测到的解决方法