python开发实用操作技巧
pip修改源
# windows系统使用cmd快速设置
pip install pip -U # 升级pip到最新版本
pip config list # 查看当前 pip 的配置
pip config set global.index-url http://mirrors.aliyun.com/pypi/simple/
pip config set install.trusted-host mirrors.aliyun.com
pip config list # 查看是否已经写入
pip修改安装路径
python -m site
python -m site -help
找到site.py并编辑
USER_SITE = “E:\A_DevelopTool\Python39\Lib\site-packages”
USER_BASE = “E:\A_DevelopTool\Python39\Scripts”
python3.9.0安装lxml
1、安装wheel
pip install wheel
2、下载lxml-4.5.2-cp39-cp39-win_amd64.whl
Unofficial Windows Binaries for Python Extension Packages地址:https://www.lfd.uci.edu/~gohlke/pythonlibs/
3、找到下载的lxml-4.5.2-cp39-cp39-win_amd64.whl所在目录,运行命令行执行安装
pip install lxml-4.5.2-cp39-cp39-win_amd64.whl
python3.9.0安装Scrapy
pip install wheel
下载Twisted-20.3.0-cp39-cp39-win_amd64.whl
pip install Twisted-20.3.0-cp39-cp39-win_amd64.whl
pip install pywin32
pip install scrapy
scrapy startproject 目录名
cd 目录名
scrapy startproject 项目名 www.xxx.com
scrapy crawl 项目名
xpath中文乱码
1、设置响应后编码response.encoding=response.apparent_encoding
2、把获取响应内容response.text改成response.content
3、把获取的文本直接转码title = title.encode(‘iso-8859-1’).decode(‘GBK’)
scrapy请求传参
def parse_detail(self,response):
item=response.meta['item']
yield item
def parse(self,response):
#重新请求
yield scrapy.Request(newurl,callback=self.parse_detail,meta={'item':items})
本文地址:https://blog.csdn.net/ruancexiaoming/article/details/109028436