scrapy框架的异步插入数据库mysql
程序员文章站
2022-05-10 19:31:05
...
- 话说scrapy 有这非常优越的爬取速度,但是插入数据我们用同步的话,感觉老阻塞在哪里,很影响爬取的效率。scrapy 有自身的异步处理数据模块adbapi.
- from twisted.enterprise import adbapi.
- 异步插入数据的Pipeline .
class BookTwistedPipeline(object):
def __init__(self):
params = {
'host': 'localhost',
'port': 3306,
'user': 'root',
'password': 'mysql',
'charset': 'utf8',
'database':'book',
'cursorclass': cursors.DictCursor
}
self.dbpool = adbapi.ConnectionPool("pymysql", **params) # 连接配置
self.sql = """insert into book_detail values(0,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s)"""
def process_item(self, item, spider):
defer = self.dbpool.runInteraction(self.insert_sql, item) # scrapy 带的异步连接池
defer.addErrback(self.handle_err, item, spider) # 错误处理的回调
def insert_sql(self,cursor,item):
cursor.execute(self.sql,(item['b_cate'], item['book_name'], item['b_href'], item['s_href'],
item['s_cate'], item['book_href'], item['book_sku'], item['venderid'],
item['author'], item['prices']))
def handle_err(self, error, item, spider):
if error:
print("INFO:%s %s"%(datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S'),error))
上一篇: Oracle数据库多表插入
推荐阅读