beanstalkd学习笔记 博客分类: 分布式队列 beanstalkd
程序员文章站
2024-03-14 19:24:05
...
原文转自:http://liaofeng-xiao.iteye.com/blog/1990577
最直接的学习Beanstalkd的方式就是读官方协议文档:
https://raw.github.com/kr/beanstalkd/master/doc/protocol.txt
beanstalkd是一个快速的、通用目的的work queue。协议简单,是个轻量级的消息中间件。
“(Beanstalkd) is a simple, fast workqueue service. Its interface is generic, but was originally designed for reducing the latency of page views in high-volume web applications by running time-consuming tasks asynchronously.”
beanstalkd的最初设计意图是在高并发的网络请求下,通过异步执行耗时较多的请求,及时返回结果,减少请求的响应延迟。
使用场景:
* long running task
* intensive task
examples:
* send email
* processing image/video
advantages:
1. asynchronous/unblock
2. scales easy: run more workers. workers can be distributed across a number of machines
3. call functions written in other languages
How it works:
* Queues: Acts as a job buffer between producer and worker
* Daemon: when job is released to a worker
* Producer: create the job and put it to the queue
* Worker: get a job from the queue, and deal with it
Comparison:
1. Queueing with a database: not so well suited, especially with high transactions, it costs locking to ensure only on worker can get a job.
2. ActiveMQ
3. RabbitMQ: written in Erlang, brought by VMVare
4. Amazon SQS
5. Gearman, by livejournal
6. ZeroMQ
7. Sparrow
Killer Features:
* tube
* priority
* delays
* TTR
它支持优先级队列(priority queue)、延迟(delay)、超时重发(time-to-run)和预留(buried),能够很好的支持分布式的后台任务和定时任务处理。
它的内部采用libevent,服务器-客户端之间采用类似Memcached的轻量级通讯协议,因此性能很高(enque: 9000 jobs/second, worker: 5200 jobs/second)
尽管是内存队列,beanstalkd提供了binlog机制,当重启beanstalkd,当前任务的状态能够从记录的本地binlog中恢复。
tube(管道):
类似于topic,一个Beanstalkd可以支持多个tube,每个tube有自己的producer/worker,tube之间相互不影响。一个job的生命周期永远都会在同一个tube中。
job优先级:job可以有0~2^32个优先级,0代表最高优先级,小于1024的优先级beanstalkd认为是urgent。beanstalkd使用最大最小堆来实现优先级排序,任何时刻调用reserve命令,拿到的都是优先级最高的job,时间复杂度是O(longn)。
delay job: 两种方式可以实现delay:
* put with delay
* release with delay
但是,什么情况下使用delay job?
任务超时重发:time-to-run
如果一个client/worker/consumer获取到job之后,在ttr时间内没有处理完,即,没有通过delete/release/bury改变任务的reserve状态,beanstalkd认为消息处理失败。如果worker在预计在ttr时间内不能处理完job,可以发送touch命令,让beanstalkd从现在开始重新计算ttr。
buried(预留任务):
如果job因为某些原因暂时无法执行,worker可以先把job置为buried状态。buried状态的job不能被任何worker reserve。管理员可以通过peek buried查看有多少预留job,进行人工干预。kick <n>可以一次线的把n条buried job踢回到ready状态。
beanstalkd协议:
Beanstalkd采用类似Memcached的文本协议,客户端和服务器通信通过文本进行。这些通信的命令可以总结为以下三类:
1. producer
a. use <tube>
b. put <priority> <delay> <ttr> [bytes]
2. worker
a. watch <tube>
b. reserve: it will block if no job is ready. or reserve-with-timeout, set timeout to 0, beanstalkd will return a job immediately, or none.
c. delete <id>
d. release <id> <priority> <delay>
e. bury <id>
f. touch <id>
3. maintainer
a. peek job
b. peek delayed
c. peek ready
d. peek buried
e. kick <n>
状态转换图:
put with delay release with delay
----------------> [DELAYED] <------------.
| |
kick | (time passes) |
| |
put v reserve | delete
-----------------> [READY] ---------> [RESERVED] --------> *poof*
^ ^ | |
| \ release | |
| `-------------' |
| |
| kick |
| |
| bury |
[BURIED] <---------------'
|
| delete
`--------> *poof*
Beanstalkd不足:
1. 没有提供主从同步+故障切换机制,在应用中有可能成为单点的风险。在实际应用中,可以使用数据库为job提供持久化存储。
2. 和Memcached类似,Beanstalkd依赖libevent单线程事件分发机制,不能有效的利用多核cpu的性能。这一点可以通过单机部署多个实例客服。
本地体验:
1. 运行beanstalkd:beanstalkd
2. telnet localhost 11300
连上后可以发送各种命令:stats
Questions:
1. if a job runs out TTR, the worker will stop processing the job? or two workers may be working on the same job.
definitely is.
2. how to experience Beanstalkd conviently in local, for example, three terminals: producer/worker/maintainer
use telnet please(use `quit` to quit terminal), or use beanstalkc client.
producer:
telnet localhost 11300
stats-tube default
use today
put 1000 0 10 11
hello world
worker:
telnet localhost 11300
watch today
ignore default
reserve( or reserve-with-timeout 0)
bury 1000 <id>
kick <number>
delete <id>
maintainer:
stats-tube today
peek <id>
stats-job <id>
peek-ready [如果返回not-found,说明该tube还没有ready的job(使用use切换tube)
peek-delayed
job-stats <id>
kick <number>
Refers:
1. Protocol: https://raw.github.com/kr/beanstalkd/master/doc/protocol.txt, or https://github.com/kr/beanstalkd/blob/master/doc/protocol.md
2. PPT: http://alister.github.io/presentations/Beanstalkd/
3. http://nubyonrails.com/articles/about-this-blog-beanstalk-messaging-queue
Beanstalkc:
https://github.com/earl/beanstalkc/
Beanstalkc is a simple beanstalkd client librayr for Python. beanstalkd is a fast, in-memory workqueue service.
Beanstalkc depends on PyYAML, but there are ways to avoid this dependency.
Beanstalkc is pure python, and is compatible with eventlet and gevent.
Usage:
import beanstalkc
beanstalkd = beanstalkc.Connection(host="localhost", port="11300")
job = beanstalkd.reserve()
print job.body
....
job.delete()
Reference: https://github.com/earl/beanstalkc/blob/master/beanstalkc.py
tutorial: http://beanstalkc.readthedocs.org/en/latest/tutorial.html
beanstalkc好简单,只有Connection和Job两个类。
最直接的学习Beanstalkd的方式就是读官方协议文档:
https://raw.github.com/kr/beanstalkd/master/doc/protocol.txt
beanstalkd是一个快速的、通用目的的work queue。协议简单,是个轻量级的消息中间件。
“(Beanstalkd) is a simple, fast workqueue service. Its interface is generic, but was originally designed for reducing the latency of page views in high-volume web applications by running time-consuming tasks asynchronously.”
beanstalkd的最初设计意图是在高并发的网络请求下,通过异步执行耗时较多的请求,及时返回结果,减少请求的响应延迟。
使用场景:
* long running task
* intensive task
examples:
* send email
* processing image/video
advantages:
1. asynchronous/unblock
2. scales easy: run more workers. workers can be distributed across a number of machines
3. call functions written in other languages
How it works:
* Queues: Acts as a job buffer between producer and worker
* Daemon: when job is released to a worker
* Producer: create the job and put it to the queue
* Worker: get a job from the queue, and deal with it
Comparison:
1. Queueing with a database: not so well suited, especially with high transactions, it costs locking to ensure only on worker can get a job.
2. ActiveMQ
3. RabbitMQ: written in Erlang, brought by VMVare
4. Amazon SQS
5. Gearman, by livejournal
6. ZeroMQ
7. Sparrow
Killer Features:
* tube
* priority
* delays
* TTR
它支持优先级队列(priority queue)、延迟(delay)、超时重发(time-to-run)和预留(buried),能够很好的支持分布式的后台任务和定时任务处理。
它的内部采用libevent,服务器-客户端之间采用类似Memcached的轻量级通讯协议,因此性能很高(enque: 9000 jobs/second, worker: 5200 jobs/second)
尽管是内存队列,beanstalkd提供了binlog机制,当重启beanstalkd,当前任务的状态能够从记录的本地binlog中恢复。
tube(管道):
类似于topic,一个Beanstalkd可以支持多个tube,每个tube有自己的producer/worker,tube之间相互不影响。一个job的生命周期永远都会在同一个tube中。
job优先级:job可以有0~2^32个优先级,0代表最高优先级,小于1024的优先级beanstalkd认为是urgent。beanstalkd使用最大最小堆来实现优先级排序,任何时刻调用reserve命令,拿到的都是优先级最高的job,时间复杂度是O(longn)。
delay job: 两种方式可以实现delay:
* put with delay
* release with delay
但是,什么情况下使用delay job?
任务超时重发:time-to-run
如果一个client/worker/consumer获取到job之后,在ttr时间内没有处理完,即,没有通过delete/release/bury改变任务的reserve状态,beanstalkd认为消息处理失败。如果worker在预计在ttr时间内不能处理完job,可以发送touch命令,让beanstalkd从现在开始重新计算ttr。
buried(预留任务):
如果job因为某些原因暂时无法执行,worker可以先把job置为buried状态。buried状态的job不能被任何worker reserve。管理员可以通过peek buried查看有多少预留job,进行人工干预。kick <n>可以一次线的把n条buried job踢回到ready状态。
beanstalkd协议:
Beanstalkd采用类似Memcached的文本协议,客户端和服务器通信通过文本进行。这些通信的命令可以总结为以下三类:
1. producer
a. use <tube>
b. put <priority> <delay> <ttr> [bytes]
2. worker
a. watch <tube>
b. reserve: it will block if no job is ready. or reserve-with-timeout, set timeout to 0, beanstalkd will return a job immediately, or none.
c. delete <id>
d. release <id> <priority> <delay>
e. bury <id>
f. touch <id>
3. maintainer
a. peek job
b. peek delayed
c. peek ready
d. peek buried
e. kick <n>
状态转换图:
put with delay release with delay
----------------> [DELAYED] <------------.
| |
kick | (time passes) |
| |
put v reserve | delete
-----------------> [READY] ---------> [RESERVED] --------> *poof*
^ ^ | |
| \ release | |
| `-------------' |
| |
| kick |
| |
| bury |
[BURIED] <---------------'
|
| delete
`--------> *poof*
Beanstalkd不足:
1. 没有提供主从同步+故障切换机制,在应用中有可能成为单点的风险。在实际应用中,可以使用数据库为job提供持久化存储。
2. 和Memcached类似,Beanstalkd依赖libevent单线程事件分发机制,不能有效的利用多核cpu的性能。这一点可以通过单机部署多个实例客服。
本地体验:
1. 运行beanstalkd:beanstalkd
2. telnet localhost 11300
连上后可以发送各种命令:stats
Questions:
1. if a job runs out TTR, the worker will stop processing the job? or two workers may be working on the same job.
definitely is.
2. how to experience Beanstalkd conviently in local, for example, three terminals: producer/worker/maintainer
use telnet please(use `quit` to quit terminal), or use beanstalkc client.
producer:
telnet localhost 11300
stats-tube default
use today
put 1000 0 10 11
hello world
worker:
telnet localhost 11300
watch today
ignore default
reserve( or reserve-with-timeout 0)
bury 1000 <id>
kick <number>
delete <id>
maintainer:
stats-tube today
peek <id>
stats-job <id>
peek-ready [如果返回not-found,说明该tube还没有ready的job(使用use切换tube)
peek-delayed
job-stats <id>
kick <number>
Refers:
1. Protocol: https://raw.github.com/kr/beanstalkd/master/doc/protocol.txt, or https://github.com/kr/beanstalkd/blob/master/doc/protocol.md
2. PPT: http://alister.github.io/presentations/Beanstalkd/
3. http://nubyonrails.com/articles/about-this-blog-beanstalk-messaging-queue
Beanstalkc:
https://github.com/earl/beanstalkc/
Beanstalkc is a simple beanstalkd client librayr for Python. beanstalkd is a fast, in-memory workqueue service.
Beanstalkc depends on PyYAML, but there are ways to avoid this dependency.
Beanstalkc is pure python, and is compatible with eventlet and gevent.
Usage:
import beanstalkc
beanstalkd = beanstalkc.Connection(host="localhost", port="11300")
job = beanstalkd.reserve()
print job.body
....
job.delete()
Reference: https://github.com/earl/beanstalkc/blob/master/beanstalkc.py
tutorial: http://beanstalkc.readthedocs.org/en/latest/tutorial.html
beanstalkc好简单,只有Connection和Job两个类。