爬虫中遇到中文乱码解决方法
程序员文章站
2022-06-26 17:15:09
1.requests设置响应的编码response.encoding = response.apparent_encoding2.scrapy中间件中添加如下代码def process_response(self, request, response, spider): response = HtmlResponse( url=response.url, body=response.body, encoding='GB2312' )...
1.requests
设置响应的编码
response.encoding = response.apparent_encoding
2.scrapy
中间件中添加process_response代码
from scrapy.http import HtmlResponse
class RandomUserAgentMiddleware(object):
def process_request(self, request, spider):
ua = random.choice(USER_AGENT_LIST)
request.headers.setdefault('User-Agent', ua)
def process_response(self, request, response, spider):
response = HtmlResponse(
url=response.url,
body=response.body,
encoding='GB2312'
)
return response
GB2312不行的话,可以改成utf-8之类的
本文地址:https://blog.csdn.net/weixin_42156283/article/details/110491336