Python分布式爬虫Scrapy打造搜索引擎2020

程序员文章站 2022-05-23 10:52:22

...

1、线程和进程
计算机的核心是CPU，它承担了所有的计算任务。它就像一座工厂，时刻在运行。

假定工厂的电力有限，一次只能供给一个车间使用。也就是说，一个车间开工的时候，其他车间都必须停工。背后的含义就是，单个CPU一次只能运行一个任务。

进程就好比工厂的车间，它代表CPU所能处理的单个任务。任一时刻，CPU总是运行一个进程，其他进程处于非运行状态。

一个车间里，可以有很多工人。他们协同完成一个任务。

线程就好比车间里的工人。一个进程可以包括多个线程。

车间的空间是工人们共享的，比如许多房间是每个工人都可以进出的。这象征一个进程的内存空间是共享的，每个线程都可以使用这些共享内存。

可是，每间房间的大小不同，有些房间最多只能容纳一个人，比如厕所。里面有人的时候，其他人就不能进去了。这代表一个线程使用某些共享内存时，其他线程必须等它结束，才能使用这一块内存。

一个防止他人进入的简单方法，就是门口加一把锁。先到的人锁上门，后到的人看到上锁，就在门口排队，等锁打开再进去。这就叫"互斥锁"（Mutual exclusion，缩写 Mutex），防止多个线程同时读写某一块内存区域。

还有些房间，可以同时容纳n个人，比如厨房。也就是说，如果人数大于n，多出来的人只能在外面等着。这好比某些内存区域，只能供给固定数目的线程使用。

这时的解决方法，就是在门口挂n把钥匙。进去的人就取一把钥匙，出来时再把钥匙挂回原处。后到的人发现钥匙架空了，就知道必须在门口排队等着了。这种做法叫做"信号量"（Semaphore），用来保证多个线程不会互相冲突。
不难看出，mutex是semaphore的一种特殊情况（n=1时）。也就是说，完全可以用后者替代前者。但是，因为mutex较为简单，且效率高，所以在必须保证资源独占的情况下，还是采用这种设计。

2、多线程与多进程
从上面关于线程和进程的的通俗解释来看，多线程和多进程的含义如下：
多进程：允许多个任务同时进行
多线程：允许单个任务分成不同的部分运行

3、Python多线程编程
3.1 单线程
在好些年前的MS-DOS时代，操作系统处理问题都是单任务的，我想做听音乐和看电影两件事儿，那么一定要先排一下顺序。

from time import ctime,sleep

def music():
for i in range(2):
print “I was listening to music. %s” %ctime()
sleep(1)

def move():
for i in range(2):
print “I was at the movies! %s” %ctime()
sleep(5)

if name == ‘main’:
music()
move()
print “all over %s” %ctime()
我们先听了一首音乐，通过for循环来控制音乐的播放了两次，每首音乐播放需要1秒钟，sleep()来控制音乐播放的时长。接着我们又看了一场电影，
每一场电影需要5秒钟，因为太好看了，所以我也通过for循环看两遍。在整个休闲娱乐活动结束后，我通过
print “all over %s” %ctime()
看了一下当前时间，差不多该睡觉了。
运行结果：

I was listening to music. Thu Apr 17 10:47:08 2014
I was listening to music. Thu Apr 17 10:47:09 2014
I was at the movies! Thu Apr 17 10:47:10 2014
I was at the movies! Thu Apr 17 10:47:15 2014
all over Thu Apr 17 10:47:20 2014
其实，music()和move()更应该被看作是音乐和视频播放器，至于要播放什么歌曲和视频应该由我们使用时决定。所以，我们对上面代码做了改造：

import threading
from time import ctime,sleep

def music(func):
for i in range(2):
print (“I was listening to %s. %s” %(func,ctime()))
sleep(1)

def move(func):
for i in range(2):
print (“I was at the %s! %s” %(func,ctime()))
sleep(5)

if name == ‘main’:
music(u’爱情买卖’)
move(u’阿凡达’)

print ("all over %s" %ctime())

运行结果：

I was listening to 爱情买卖. Thu Apr 17 11:48:59 2014
I was listening to 爱情买卖. Thu Apr 17 11:49:00 2014
I was at the 阿凡达! Thu Apr 17 11:49:01 2014
I was at the 阿凡达! Thu Apr 17 11:49:06 2014
all over Thu Apr 17 11:49:11 2014
3.2 多线程
Python3 通过两个标准库 _thread (python2中是thread模块）和 threading 提供对线程的支持。
_thread 提供了低级别的、原始的线程以及一个简单的锁，它相比于 threading 模块的功能还是比较有限的。

3.2.1使用_thread模块
调用_thread模块中的start_new_thread()函数来产生新线程。
先用一个实例感受一下：

import _thread
import time

为线程定义一个函数

def print_time(threadName, delay):
count = 0
while count < 5:
time.sleep(delay)
count += 1
print ("%s: %s" % (threadName, time.ctime(time.time())))

创建两个线程

try:
_thread.start_new_thread(print_time, (“Thread-1”, 2,))
_thread.start_new_thread(print_time, (“Thread-2”, 4,))
except:
print (“Error: unable to start thread”)

while 1:
pass

print (“Main Finished”)
代码输出为：

Thread-1: Thu Aug 10 16:35:47 2017
Thread-2: Thu Aug 10 16:35:49 2017
Thread-1: Thu Aug 10 16:35:49 2017
Thread-1: Thu Aug 10 16:35:51 2017
Thread-2: Thu Aug 10 16:35:53 2017
Thread-1: Thu Aug 10 16:35:53 2017
Thread-1: Thu Aug 10 16:35:55 2017
Thread-2: Thu Aug 10 16:35:57 2017
Thread-2: Thu Aug 10 16:36:01 2017
注意到，在主线程写了:

while 1:
pass
这是让主线程一直在等待.
如果去掉上面两行，那就直接输出并结束程序执行:

“Main Finished”
3.2.2使用threading模块
threading 模块除了包含 _thread 模块中的所有方法外，还提供的其他方法：
threading.currentThread(): 返回当前的线程变量。
threading.enumerate(): 返回一个包含正在运行的线程的list。正在运行指线程启动后、结束前，不包括启动前和终止后的线程。
threading.activeCount(): 返回正在运行的线程数量，与len(threading.enumerate())有相同的结果。
除了使用方法外，线程模块同样提供了Thread类来处理线程，Thread类提供了以下方法:
run(): 用以表示线程活动的方法。
start():启动线程活动。
join([time]): 等待至线程中止。这阻塞调用线程直至线程的join() 方法被调用中止-正常退出或者抛出未处理的异常-或者是可选的超时发生。
isAlive(): 返回线程是否活动的。
getName(): 返回线程名。
setName(): 设置线程名。

直接创建线程
接上面的听音乐和看电影的例子，我们可以直接使用threading.Thread 创建线程，并指定执行的方法以及传递的参数：

import threading
from time import ctime,sleep

def music(func):
for i in range(2):
print (“I was listening to %s. %s” %(func,ctime()))
sleep(1)

def move(func):
for i in range(2):
print (“I was at the %s! %s” %(func,ctime()))
sleep(5)

threads = []
t1 = threading.Thread(target=music,args=(u’爱情买卖’,))
threads.append(t1)
t2 = threading.Thread(target=move,args=(u’阿凡达’,))
threads.append(t2)

if name == ‘main’:
for t in threads:
t.start()

print ("all over %s" %ctime())

结果输出为：

I was listening to 爱情买卖. Thu Aug 10 16:57:12 2017
I was at the 阿凡达! Thu Aug 10 16:57:12 2017
all over Thu Aug 10 16:57:12 2017
I was listening to 爱情买卖. Thu Aug 10 16:57:13 2017
I was at the 阿凡达! Thu Aug 10 16:57:17 2017
构造线程类
我们也可以通过直接从 threading.Thread 继承创建一个新的子类，并实例化后调用 start() 方法启动新线程，即它调用了线程的 run() 方法：

#!/usr/bin/python3

import threading
import time

exitFlag = 0

class myThread (threading.Thread):
def init(self, threadID, name, counter):
threading.Thread.init(self)
self.threadID = threadID
self.name = name
self.counter = counter
def run(self):
print (“开始线程：” + self.name)
print_time(self.name, self.counter, 5)
print (“退出线程：” + self.name)

def print_time(threadName, delay, counter):
while counter:
if exitFlag:
threadName.exit()
time.sleep(delay)
print ("%s: %s" % (threadName, time.ctime(time.time())))
counter -= 1

创建新线程

thread1 = myThread(1, “Thread-1”, 1)
thread2 = myThread(2, “Thread-2”, 2)

开启新线程

thread1.start()
thread2.start()
print (“退出主线程”)
结果输出为：

开始线程：Thread-1
开始线程：Thread-2
退出主线程
Thread-1: Thu Aug 10 16:48:41 2017
Thread-2: Thu Aug 10 16:48:42 2017
Thread-1: Thu Aug 10 16:48:42 2017
Thread-1: Thu Aug 10 16:48:43 2017
Thread-2: Thu Aug 10 16:48:44 2017
Thread-1: Thu Aug 10 16:48:44 2017
Thread-1: Thu Aug 10 16:48:45 2017
退出线程：Thread-1
Thread-2: Thu Aug 10 16:48:46 2017
Thread-2: Thu Aug 10 16:48:48 2017
Thread-2: Thu Aug 10 16:48:50 2017
退出线程：Thread-2
从结果可以看到，为什么我们开启了两个线程之后，主线程立即退出了？因为我们没有使用join方法，对于主线程来说，thread1和thread2是子线程，使用join方法，会让主线程等待子线程执行解说再继续执行。

join()方法
我们修改一下代码：

#!/usr/bin/python3

import threading
import time

exitFlag = 0

def print_time(threadName, delay, counter):
while counter:
if exitFlag:
threadName.exit()
time.sleep(delay)
print ("%s: %s" % (threadName, time.ctime(time.time())))
counter -= 1

创建新线程

thread1 = myThread(1, “Thread-1”, 1)
thread2 = myThread(2, “Thread-2”, 2)

开启新线程

thread1.start()
thread2.start()
thread1.join()
thread2.join()
print (“退出主线程”)
结果就变为：

开始线程：Thread-1
开始线程：Thread-2
Thread-1: Thu Aug 10 16:52:07 2017
Thread-2: Thu Aug 10 16:52:08 2017
Thread-1: Thu Aug 10 16:52:08 2017
Thread-1: Thu Aug 10 16:52:09 2017
Thread-2: Thu Aug 10 16:52:10 2017
Thread-1: Thu Aug 10 16:52:10 2017
Thread-1: Thu Aug 10 16:52:11 2017
退出线程：Thread-1
Thread-2: Thu Aug 10 16:52:12 2017
Thread-2: Thu Aug 10 16:52:14 2017
Thread-2: Thu Aug 10 16:52:16 2017
退出线程：Thread-2
退出主线程
可以看到退出主线程在最后才被打印出来。

setDaemon()方法
有一个方法常常拿来与join方法做比较，那就是setDaemon()方法。我们首先来看一下setDaemon()方法的使用效果：

#!/usr/bin/python3

import threading
import time

exitFlag = 0

def print_time(threadName, delay, counter):
while counter:
if exitFlag:
threadName.exit()
time.sleep(delay)
print ("%s: %s" % (threadName, time.ctime(time.time())))
counter -= 1

创建新线程

thread1 = myThread(1, “Thread-1”, 1)
thread2 = myThread(2, “Thread-2”, 2)

开启新线程

thread1.setDaemon(True)
thread2.setDaemon(True)
thread1.start()
thread2.start()

print (“退出主线程”)
结果输出为：

开始线程：Thread-1
开始线程：Thread-2
退出主线程
可以看到，在主线程结束之后，程序就终止了，也就是说两个子线程也被终止了，这就是setDaemon方法的作用。主线程A中，创建了子线程B，并且在主线程A中调用了B.setDaemon(),这个的意思是，把主线程A设置为守护线程，这时候，要是主线程A执行结束了，就不管子线程B是否完成,一并和主线程A退出.这就是setDaemon方法的含义，这基本和join是相反的。此外，还有个要特别注意的：必须在start() 方法调用之前设置，如果不设置为守护线程，程序会被无限挂起。

两个疑问
我们刚才介绍了两种使用多线程的方式，一种是直接调用threading.Thread 创建线程，另一种是从 threading.Thread 继承创建一个新的子类，并实例化后调用 start() 方法启动进程。学到这里，我就抛出了两个疑问，为什么第一种方法中我们可以为不同的线程指定运行的方法，而第二种我们都运行的是同一个方法，那么它内部的实现机制是什么呢？第二个疑问是，第二种方法中，我们没有实例化start()方法，那么run和start这两个方法的联系是什么呢？
首先，start方法和run方法的关系如下：用start方法来启动线程，真正实现了多线程运行，这时无需等待run方法体代码执行完毕而直接继续执行下面的代码。通过调用Thread类的start()方法来启动一个线程，这时此线程处于就绪（可运行）状态，并没有运行，一旦得到cpu时间片，就开始执行run()方法，这里方法 run()称为线程体，它包含了要执行的这个线程的内容，Run方法运行结束，此线程随即终止。

而run()方法的源码如下，可以看到，如果我们指定了target即线程执行的函数的话，run方法可以转而调用那个函数，如果没有的话，将不执行，而我们在自定义的Thread类里面重写了这个run 方法，所以程序会执行这一段。

def run(self):
    """Method representing the thread's activity.

    You may override this method in a subclass. The standard run() method
    invokes the callable object passed to the object's constructor as the
    target argument, if any, with sequential and keyword arguments taken
    from the args and kwargs arguments, respectively.

    """
    try:
        if self._target:
            self._target(*self._args, **self._kwargs)
    finally:
        # Avoid a refcycle if the thread is running a function with
        # an argument that has a member that points to the thread.
        del self._target, self._args, self._kwargs

线程同步
如果多个线程共同对某个数据修改，则可能出现不可预料的结果，为了保证数据的正确性，需要对多个线程进行同步。
使用 Thread 对象的 Lock 和 Rlock 可以实现简单的线程同步，这两个对象都有 acquire 方法和 release 方法，对于那些需要每次只允许一个线程操作的数据，可以将其操作放到 acquire 和 release 方法之间。如下：
多线程的优势在于可以同时运行多个任务（至少感觉起来是这样）。但是当线程需要共享数据时，可能存在数据不同步的问题。
考虑这样一种情况：一个列表里所有元素都是0，线程"set"从后向前把所有元素改成1，而线程"print"负责从前往后读取列表并打印。
那么，可能线程"set"开始改的时候，线程"print"便来打印列表了，输出就成了一半0一半1，这就是数据的不同步。为了避免这种情况，引入了锁的概念。
锁有两种状态——锁定和未锁定。每当一个线程比如"set"要访问共享数据时，必须先获得锁定；如果已经有别的线程比如"print"获得锁定了，那么就让线程"set"暂停，也就是同步阻塞；等到线程"print"访问完毕，释放锁以后，再让线程"set"继续。
经过这样的处理，打印列表时要么全部输出0，要么全部输出1，不会再出现一半0一半1的尴尬场面。
实例：

#!/usr/bin/python3

import threading
import time

class myThread (threading.Thread):
def init(self, threadID, name, counter):
threading.Thread.init(self)
self.threadID = threadID
self.name = name
self.counter = counter
def run(self):
print ("开启线程： " + self.name)
# 获取锁，用于线程同步
threadLock.acquire()
print_time(self.name, self.counter, 3)
# 释放锁，开启下一个线程
threadLock.release()

def print_time(threadName, delay, counter):
while counter:
time.sleep(delay)
print ("%s: %s" % (threadName, time.ctime(time.time())))
counter -= 1

threadLock = threading.Lock()
threads = []

创建新线程

thread1 = myThread(1, “Thread-1”, 1)
thread2 = myThread(2, “Thread-2”, 2)

开启新线程

thread1.start()
thread2.start()

添加线程到线程列表

threads.append(thread1)
threads.append(thread2)

等待所有线程完成

for t in threads:
t.join()
print (“退出主线程”)
输出为：

开启线程： Thread-1
开启线程： Thread-2
Thread-1: Thu Aug 10 20:45:59 2017
Thread-1: Thu Aug 10 20:46:00 2017
Thread-1: Thu Aug 10 20:46:01 2017
Thread-2: Thu Aug 10 20:46:03 2017
Thread-2: Thu Aug 10 20:46:05 2017
Thread-2: Thu Aug 10 20:46:07 2017
退出主线程
线程优先级队列（ Queue）
Python 的 Queue 模块中提供了同步的、线程安全的队列类，包括FIFO（先入先出)队列Queue，LIFO（后入先出）队列LifoQueue，和优先级队列 PriorityQueue。
这些队列都实现了锁原语，能够在多线程中直接使用，可以使用队列来实现线程间的同步。
Queue 模块中的常用方法:
Queue.qsize() 返回队列的大小
Queue.empty() 如果队列为空，返回True,反之False
Queue.full() 如果队列满了，返回True,反之False
Queue.full 与 maxsize 大小对应
Queue.get([block[, timeout]])获取队列，timeout等待时间
Queue.get_nowait() 相当Queue.get(False)
Queue.put(item) 写入队列，timeout等待时间
Queue.put_nowait(item) 相当Queue.put(item, False)
Queue.task_done() 在完成一项工作之后，Queue.task_done()函数向任务已经完成的队列发送一个信号
Queue.join() 实际上意味着等到队列为空，再执行别的操作

#!/usr/bin/python3

import queue
import threading
import time

exitFlag = 0

class myThread (threading.Thread):
def init(self, threadID, name, q):
threading.Thread.init(self)
self.threadID = threadID
self.name = name
self.q = q
def run(self):
print (“开启线程：” + self.name)
process_data(self.name, self.q)
print (“退出线程：” + self.name)

def process_data(threadName, q):
while not exitFlag:
queueLock.acquire()
if not workQueue.empty():
data = q.get()
queueLock.release()
print ("%s processing %s" % (threadName, data))
else:
queueLock.release()
time.sleep(1)

threadList = [“Thread-1”, “Thread-2”, “Thread-3”]
nameList = [“One”, “Two”, “Three”, “Four”, “Five”]
queueLock = threading.Lock()
workQueue = queue.Queue(10)
threads = []
threadID = 1

创建新线程

for tName in threadList:
thread = myThread(threadID, tName, workQueue)
thread.start()
threads.append(thread)
threadID += 1

填充队列

queueLock.acquire()
for word in nameList:
workQueue.put(word)
queueLock.release()

等待队列清空

while not workQueue.empty():
pass

通知线程是时候退出

exitFlag = 1

等待所有线程完成

for t in threads:
t.join()
print (“退出主线程”)
上面的代码每次执行的结果是不一样的，取决于哪个进程先获得锁，一次运行的输出如下：

开启线程：Thread-1
开启线程：Thread-2
开启线程：Thread-3
Thread-2 processing One
Thread-3 processing Two
Thread-1 processing Three
Thread-3 processing Four
Thread-1 processing Five
退出线程：Thread-3
退出线程：Thread-2
退出线程：Thread-1
退出主线程

作者：文哥的学习日记
链接：https://www.jianshu.com/p/f58ff94ec92b
来源：简书
著作权归作者所有。商业转载请联系作者获得授权，非商业转载请注明出处。

对于python 多线程的理解，我花了很长时间，搜索的大部份文章都不够通俗易懂。所以，这里力图用简单的例子，让你对多线程有个初步的认识。

单线程

在好些年前的MS-DOS时代，操作系统处理问题都是单任务的，我想做听音乐和看电影两件事儿，那么一定要先排一下顺序。

（好吧！我们不纠结在DOS时代是否有听音乐和看影的应用。^_）

复制代码
from time import ctime,sleep

def music():
for i in range(2):
print “I was listening to music. %s” %ctime()
sleep(1)

def move():
for i in range(2):
print “I was at the movies! %s” %ctime()
sleep(5)

if name == ‘main’:
music()
move()
print “all over %s” %ctime()
复制代码
　　我们先听了一首音乐，通过for循环来控制音乐的播放了两次，每首音乐播放需要1秒钟，sleep()来控制音乐播放的时长。接着我们又看了一场电影，

每一场电影需要5秒钟，因为太好看了，所以我也通过for循环看两遍。在整个休闲娱乐活动结束后，我通过

print “all over %s” %ctime()
看了一下当前时间，差不多该睡觉了。

运行结果：

复制代码

=========================== RESTART ================================

其实，music()和move()更应该被看作是音乐和视频播放器，至于要播放什么歌曲和视频应该由我们使用时决定。所以，我们对上面代码做了改造：

复制代码
#coding=utf-8
import threading
from time import ctime,sleep

def music(func):
for i in range(2):
print “I was listening to %s. %s” %(func,ctime())
sleep(1)

def move(func):
for i in range(2):
print “I was at the %s! %s” %(func,ctime())
sleep(5)

if name == ‘main’:
music(u’爱情买卖’)
move(u’阿凡达’)

print "all over %s" %ctime()

复制代码
　　对music()和move()进行了传参处理。体验中国经典歌曲和欧美大片文化。

运行结果：

复制代码

======================== RESTART ================================

多线程

科技在发展，时代在进步，我们的CPU也越来越快，CPU抱怨，P大点事儿占了我一定的时间，其实我同时干多个活都没问题的；于是，操作系统就进入了多任务时代。我们听着音乐吃着火锅的不在是梦想。

python提供了两个模块来实现多线程thread 和threading ，thread 有一些缺点，在threading 得到了弥补，为了不浪费你和时间，所以我们直接学习threading 就可以了。

继续对上面的例子进行改造，引入threadring来同时播放音乐和视频：

复制代码
#coding=utf-8
import threading
from time import ctime,sleep

def music(func):
for i in range(2):
print “I was listening to %s. %s” %(func,ctime())
sleep(1)

def move(func):
for i in range(2):
print “I was at the %s! %s” %(func,ctime())
sleep(5)

threads = []
t1 = threading.Thread(target=music,args=(u’爱情买卖’,))
threads.append(t1)
t2 = threading.Thread(target=move,args=(u’阿凡达’,))
threads.append(t2)

if name == ‘main’:
for t in threads:
t.setDaemon(True)
t.start()

print "all over %s" %ctime()

复制代码

import threading

首先导入threading 模块，这是使用多线程的前提。

threads = []

t1 = threading.Thread(target=music,args=(u’爱情买卖’,))

threads.append(t1)

创建了threads数组，创建线程t1,使用threading.Thread()方法，在这个方法中调用music方法target=music，args方法对music进行传参。把创建好的线程t1装到threads数组中。

接着以同样的方式创建线程t2，并把t2也装到threads数组。

for t in threads:

t.setDaemon(True)

t.start()

最后通过for循环遍历数组。（数组被装载了t1和t2两个线程）

setDaemon()

setDaemon(True)将线程声明为守护线程，必须在start() 方法调用之前设置，如果不设置为守护线程程序会被无限挂起。子线程启动后，父线程也继续执行下去，当父线程执行完最后一条语句print “all over %s” %ctime()后，没有等待子线程，直接就退出了，同时子线程也一同结束。

start()

开始线程活动。

运行结果：

========================= RESTART ================================

I was listening to 爱情买卖. Thu Apr 17 12:51:45 2014 I was at the 阿凡达! Thu Apr 17 12:51:45 2014 all over Thu Apr 17 12:51:45 2014
　　从执行结果来看，子线程（muisc 、move ）和主线程（print “all over %s” %ctime()）都是同一时间启动，但由于主线程执行完结束，所以导致子线程也终止。

继续调整程序：

复制代码
…
if name == ‘main’:
for t in threads:
t.setDaemon(True)
t.start()

t.join()

print "all over %s" %ctime()

复制代码
　　我们只对上面的程序加了个join()方法，用于等待线程终止。join（）的作用是，在子线程完成运行之前，这个子线程的父线程将一直被阻塞。

注意: join()方法的位置是在for循环外的，也就是说必须等待for循环里的两个进程都结束后，才去执行主进程。

运行结果：

复制代码

========================= RESTART ================================

I was listening to 爱情买卖. Thu Apr 17 13:04:11 2014 I was at the 阿凡达! Thu Apr 17 13:04:11 2014

I was listening to 爱情买卖. Thu Apr 17 13:04:12 2014
I was at the 阿凡达! Thu Apr 17 13:04:16 2014
all over Thu Apr 17 13:04:21 2014
复制代码
　　从执行结果可看到，music 和move 是同时启动的。

开始时间4分11秒，直到调用主进程为4分22秒，总耗时为10秒。从单线程时减少了2秒，我们可以把music的sleep()的时间调整为4秒。

复制代码
…
def music(func):
for i in range(2):
print “I was listening to %s. %s” %(func,ctime())
sleep(4)
…
复制代码
执行结果：

复制代码

====================== RESTART ================================

I was listening to 爱情买卖. Thu Apr 17 13:11:27 2014I was at the 阿凡达! Thu Apr 17 13:11:27 2014

I was listening to 爱情买卖. Thu Apr 17 13:11:31 2014
I was at the 阿凡达! Thu Apr 17 13:11:32 2014
all over Thu Apr 17 13:11:37 2014
复制代码
　　子线程启动11分27秒，主线程运行11分37秒。

虽然music每首歌曲从1秒延长到了4 ，但通多程线的方式运行脚本，总的时间没变化。

本文从感性上让你快速理解python多线程的使用，更详细的使用请参考其它文档或资料。

==========================================================

class threading.Thread()说明：

class threading.Thread(group=None, target=None, name=None, args=(), kwargs={})

This constructor should always be called with keyword arguments. Arguments are:

group should be None; reserved for future extension when a ThreadGroup class is implemented.

target is the callable object to be invoked by the run() method. Defaults to None, meaning nothing is called.

name is the thread name. By default, a unique name is constructed of the form “Thread-N” where N is a small decimal number.

args is the argument tuple for the target invocation. Defaults to ().

kwargs is a dictionary of keyword arguments for the target invocation. Defaults to {}.

If the subclass overrides the constructor, it must make sure to invoke the base class constructor (Thread.init()) before doing

anything else to the thread.

百度网盘

相关标签： pytorch tensorflow 自动驾驶神经网络

上一篇： php将数组写入文件

下一篇： php ftp上传多个文件时失败

Python分布式爬虫Scrapy打造搜索引擎2020

为线程定义一个函数

创建两个线程

创建新线程

开启新线程

创建新线程

开启新线程

创建新线程

开启新线程

创建新线程

开启新线程

添加线程到线程列表

等待所有线程完成

创建新线程

填充队列

等待队列清空

通知线程是时候退出

等待所有线程完成

百度网盘

21天打造分布式爬虫-Scrapy框架（七）

Python信息系统（Scrapy分布式+Django前后端）-4.Gerapy爬虫分布式部署

Scrapy分布式爬虫打造搜索引擎 - (一）基础知识

Scrapy分布式爬虫打造搜索引擎-（八）elasticsearch结合django搭建搜索引擎

python分布式爬虫搜索引擎实战-4-scrapy框架初体验

python分布式爬虫搜索引擎实战-5-右键即可的爬虫利器xpath爬取伯乐在线实例

Scrapy分布式爬虫打造搜索引擎-（六）scrapy进阶开发

学习python-day003---转自Python分布式爬虫打造搜索引擎Scrapy精讲

Scrapy分布式爬虫打造搜索引擎-（七）scrapy-redis 分布式爬虫

Scrapy分布式爬虫打造搜索引擎 - （三）知乎网问题和答案爬取