欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

epoll

程序员文章站 2022-03-16 09:58:38
...

来源:https://linux.die.net/man/7/epoll
来源: https://blog.csdn.net/hdutigerkin/article/details/7517390

Description(描述)

The epoll API performs a similar task to poll(2): monitoring multiple file descriptors to see if I/O is possible on any of them.
epoll API执行和poll相似的任务:监控多文件描述符中的任何文件描述符是否可能发生了IO操作

The epoll API can be used either as an edge-triggered or a level-triggered interface and scales well to large numbers of watched file descriptors.
epoll API常常使用边缘触发和水平触发接口,并且它对于大规模的文件描述监控有很好的效果。

The following system calls are provided to create and manage an epoll instance:
以下系统调用提供了穿件和管理了一个epoll实例:

epoll_create(2) creates an epoll instance and returns a file descriptor referring to that instance.
epoll_create创建了一个epoll实例并返回一个文件描述符指向这个实例
(The more recent epoll_create1(2) extends the functionality of epoll_create(2).)
最近的epoll_create1扩展了epoll_create的功能

Interest in particular file descriptors is then registered via epoll_ctl(2).
可以通过epoll_ctl来注册感兴趣的文件描述符

The set of file descriptors currently registered on an epoll instance is sometimes called an epoll set.
注册在现在的文件描述符集合的epoll实例被称为epoll 集合

epoll_wait(2) waits for I/O events, blocking the calling thread if no events are currently available.
epoll_wait等待IO事件,当现在没有有效的事件会阻塞调用的线程

Level-triggered and edge-triggered(水平触发和边缘触发)

The epoll event distribution interface is able to behave both as edge-triggered (ET) and as level-triggered (LT).
epoll事件分布接口可以表现为边缘触发和水平触发

The difference between the two mechanisms can be described as follows. Suppose that this scenario happens:
接下来会描述两种机制的不同,假设这种情况发生:

1.The file descriptor that represents the read side of a pipe (rfd) is registered on the epoll instance.
1.文件描述符表示在读管道的一边(rfd)注册了一个epoll实例

2.A pipe writer writes 2 kB of data on the write side of the pipe.
2.向写管道这一侧写了2K大小的数据(逐字翻译有点奇怪,大概意思就是这样吧)

3.A call to epoll_wait(2) is done that will return rfd as a ready file descriptor.
3.调用epoll_wait完成,而且会返回读事件文件描述符rfd

4.The pipe reader reads 1 kB of data from rfd.
4.管道读取者从rfd中读取了1KB大小的数据(还有1KB留在了管道里)

5.A call to epoll_wait(2) is done.
5.但是epoll_wait已经调用完成了

If the rfd file descriptor has been added to the epoll interface using the EPOLLET (edge-triggered) flag,
如果rfd文件描述符已经增加了EPOLLED(边缘触发)标志的epoll接口

the call to epoll_wait(2) done in step 5 will probably hang despite the available data still present in the file input buffer;
在第5步调用epoll_wait完成可能会挂起。尽管现在文件输入buffer仍然存在有效的数据;

meanwhile the remote peer might be expecting a response based on the data it already sent.
同时远端的同辈(应该就是管道writer??)可能期待的响应是出具已经发送。

The reason for this is that edge-triggered mode only delivers events when changes occur on the monitored file descriptor.
这样的原因是因为边缘触发模式仅仅传递当被监视的文件描述符发生改变这样的事件

So, in step 5 the caller might end up waiting for some data that is already present inside the input buffer.
所以,步骤5的调用者可能会等待那些已经存在输入buffer中的数据结束

In the above example, an event on rfd will be generated because of the write done in 2 and the event is consumed in 3.
在上面的例子中,会因为步骤2的写完成而生成rfd事件,并且这个事件会在步骤3被消费掉

Since the read operation done in 4 does not consume the whole buffer data, the call to epoll_wait(2) done in step 5 might block indefinitely.
当步骤4中的读操作完成但没有消费掉整个buffer中的数据,在步骤5调用epoll_wait可能无限期的阻塞下去。

An application that employs the EPOLLET flag should use nonblocking file descriptors to avoid having a blocking read or write starve a task that is handling multiple file descriptors.
一个使用EPOLLET标志的应用程序应该使用非阻塞的文件描述符去避免使用阻塞的读或者饿死(?没有想到好的词)的写的方式去处理多文件描述符的任务

The suggested way to use epoll as an edge-triggered (EPOLLET) interface is as follows:
使用边缘触发(epollet标志)的epoll接口应该遵循以下的方式:
i、with nonblocking file descriptors; and
i、使用非阻塞的文件描述符,并且
ii、by waiting for an event only after read(2) or write(2) return EAGAIN.
ii、只有在读或者写返回eagain之后等待一个事件
By contrast, when used as a level-triggered interface (the default, when EPOLLET is not specified),
通过对比,当使用水平触发的接口(默认的,没有指定EPOLLET标志)
epoll is simply a faster poll(2), and can be used wherever the latter is used since it shares the same semantics.
epoll是一个简单的更快的poll,并且可以在任何地方使用,因为后者(poll)和水平触发接口有同样的语义
(水平触发,也是poll的方式,如果你不做任何操作,内核还是会继续通知你)

Since even with edge-triggered epoll, multiple events can be generated upon receipt of multiple chunks of data, the caller has the option to specify the EPOLLONESHOT flag, to tell epoll to disable the associated file descriptor after the receipt of an event with epoll_wait(2).
即使使用了边缘触发epoll,接受多个块的数据会生成多个事件,调用者可以指定epolloneshot标志,告诉epoll禁用在epoll_wait接收事件后的相关文件描述符。
When the EPOLLONESHOT flag is specified, it is the caller’s responsibility to rearm the file descriptor using epoll_ctl(2) withEPOLL_CTL_MOD.
当epollshot标志被指定后,调用者有责任使用EPOLL_CTL_MOD标志的epoll_ctl的重新装备文件描述符

示例

#define MAX_EVENTS 10
struct epoll_event ev, events[MAX_EVENTS];
int listen_sock, conn_sock, nfds, epollfd;

/* Set up listening socket, 'listen_sock' (socket(),
   bind(), listen()) */

epollfd = epoll_create(10);
if (epollfd == -1) {
    perror("epoll_create");
    exit(EXIT_FAILURE);
}

ev.events = EPOLLIN;
ev.data.fd = listen_sock;
if (epoll_ctl(epollfd, EPOLL_CTL_ADD, listen_sock, &ev) == -1) {
    perror("epoll_ctl: listen_sock");
    exit(EXIT_FAILURE);
}

for (;;) {
    nfds = epoll_wait(epollfd, events, MAX_EVENTS, -1);
    if (nfds == -1) {
        perror("epoll_pwait");
        exit(EXIT_FAILURE);
    }

   for (n = 0; n < nfds; ++n) {
        if (events[n].data.fd == listen_sock) {
            conn_sock = accept(listen_sock,
                            (struct sockaddr *) &local, &addrlen);
            if (conn_sock == -1) {
                perror("accept");
                exit(EXIT_FAILURE);
            }
            setnonblocking(conn_sock);
            ev.events = EPOLLIN | EPOLLET;
            ev.data.fd = conn_sock;
            if (epoll_ctl(epollfd, EPOLL_CTL_ADD, conn_sock,
                        &ev) == -1) {
                perror("epoll_ctl: conn_sock");
                exit(EXIT_FAILURE);
            }
        } else {
            do_use_fd(events[n].data.fd);
        }
    }
}

epoll高效的原因

epoll的高效就在于,当我们调用epoll_ctl往里塞入百万个句柄时,epoll_wait仍然可以飞快的返回,并有效的将发生事件的句柄给我们用户。这是由于我们在调用epoll_create时,内核除了帮我们在epoll文件系统里建了个file结点,在内核cache里建了个红黑树用于存储以后epoll_ctl传来的socket外,还会再建立一个list链表,用于存储准备就绪的事件,当epoll_wait调用时,仅仅观察这个list链表里有没有数据即可。有数据就返回,没有数据就sleep,等到timeout时间到后即使链表没数据也返回。所以,epoll_wait非常高效。

而且,通常情况下即使我们要监控百万计的句柄,大多一次也只返回很少量的准备就绪句柄而已,所以,epoll_wait仅需要从内核态copy少量的句柄到用户态而已,如何能不高效?!

那么,这个准备就绪list链表是怎么维护的呢?当我们执行epoll_ctl时,除了把socket放到epoll文件系统里file对象对应的红黑树上之外,还会给内核中断处理程序注册一个回调函数,告诉内核,如果这个句柄的中断到了,就把它放到准备就绪list链表里。所以,当一个socket上有数据到了,内核在把网卡上的数据copy到内核中后就来把socket插入到准备就绪链表里了。

如此,一颗红黑树,一张准备就绪句柄链表,少量的内核cache,就帮我们解决了大并发下的socket处理问题。执行epoll_create时,创建了红黑树和就绪链表,执行epoll_ctl时,如果增加socket句柄,则检查在红黑树中是否存在,存在立即返回,不存在则添加到树干上,然后向内核注册回调函数,用于当中断事件来临时向准备就绪链表中插入数据。执行epoll_wait时立刻返回准备就绪链表里的数据即可。

epoll的回调服务器代码

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <error.h>

#include <netinet/in.h>
#include <sys/socket.h>
#include <arpa/inet.h>
#include <sys/epoll.h>
#include <unistd.h>
#include <sys/types.h>

#define IPADDRESS "127.0.0.1"
#define PORT 6666
#define MAXSIZE 1024
#define LISTENQ 1024
#define FDSIZE 1000
#define EPOLLEVENTS 100
// typedef union epoll_data
// {
//   void *ptr;
//   int fd;
//   uint32_t u32;
//   uint64_t u64;
// } epoll_data_t;

// struct epoll_event
// {
//   uint32_t events;	/* Epoll events */
//   epoll_data_t data;	/* User data variable */
// } __EPOLL_PACKED;

//创建套接字并进行绑定
int socket_bind(const char *ip, int port);
//IO多路复用
void do_epoll(int listenfd);
//事件处理函数
void handle_events(int epollfd, struct epoll_event *events, int num, int listenfd, char *buf);
//处理接收到的连接
void handle_accpet(int epollfd, int listenfd);
//读处理
void do_read(int epollfd, int fd, char *buf);
//写处理
void do_write(int epollfd, int fd, char *buf);
//添加事件
void add_event(int epollfd, int fd, int state);
//修改事件
void modify_event(int epollfd, int fd, int state);
//删除事件
void delete_event(int epollfd, int fd, int state);

int main(int argc, char *argv[])
{
    int listenfd;
    listenfd = socket_bind(IPADDRESS, PORT);
    listen(listenfd, LISTENQ);
    do_epoll(listenfd);
    return 0;
}

int socket_bind(const char *ip, int port)
{
    int listenfd;
    struct sockaddr_in servaddr;
    listenfd = socket(AF_INET, SOCK_STREAM, 0); //绑定套接字
    if (listenfd == -1)
    {
        perror("socket error:");
        exit(1);
    }
    bzero(&servaddr, sizeof(servaddr));
    servaddr.sin_family = AF_INET;
    inet_pton(AF_INET, ip, &servaddr.sin_addr);
    servaddr.sin_port = htons(port);
    if (bind(listenfd, (struct sockaddr *)&servaddr, sizeof(servaddr)) == -1)
    {
        perror("bind error:");
        exit(1);
    }
    return listenfd;
}

void do_epoll(int listenfd)
{
    int epollfd;
    struct epoll_event events[EPOLLEVENTS];
    int ret;
    char buf[MAXSIZE];
    memset(buf, 0, MAXSIZE);
    //创建一个描述符
    epollfd = epoll_create(FDSIZE);
    //添加监听描述符事件,对listenfd(套接字监听描述符)添加EPOLLIN的epoll事件
    add_event(epollfd, listenfd, EPOLLIN);
    while (1)
    {
        //获取已经准备好的描述符号事件
        ret = epoll_wait(epollfd, events, EPOLLEVENTS, -1);//等待epollfdepoll监听描述符被唤醒
        handle_events(epollfd, events, ret, listenfd, buf);//唤醒后
    }
    close(epollfd); //需要显示关闭
}

void handle_events(int epollfd, struct epoll_event *events, int num, int listenfd, char *buf)
{
    int i;
    int fd;
    //进行遍历
    for (i = 0; i < num; i++)
    {
        fd = events[i].data.fd;
        if ((fd == listenfd) && (events[i].events & EPOLLIN))
        {
            //如果监听的文件描述符是监听socket的文件描述符,而且事件的标志是EPOLLIN,说明有客户端发起了连接
            handle_accpet(epollfd, listenfd);
        }
        else if (events[i].events & EPOLLIN)
        {
            //如果不是sokcet的套接字,而且是个读事件,说明是某个accept,接收到了别的客户端发来的数据
            do_read(epollfd, fd, buf);
        }
        else if (events[i].events & EPOLLOUT)
        {
            //写事件就发送给客户端
            do_write(epollfd, fd, buf);
        }
    }
}

void handle_accpet(int epollfd, int listenfd)
{
    int clifd;
    struct sockaddr_in cliaddr;
    socklen_t cliaddrlen;
    //读写都用的是这个文件描述符
    clifd = accept(listenfd, (struct sockaddr *)&cliaddr, &cliaddrlen);
    if (clifd == -1)
    {
        perror("accpet error:");
    }
    else
    {
        printf("accept a new client:%s:%d\n", inet_ntoa(cliaddr.sin_addr), cliaddr.sin_port);
        //如果发起了连接,后面就是读取和写入
        //所以添加accept的监听套接字注册到epollfd的事件上
        add_event(epollfd, clifd, EPOLLIN);
    }
}

void do_read(int epollfd, int fd, char *buf)
{
    int nread;
    nread = read(fd, buf, MAXSIZE);
    if (nread == -1)
    {
        perror("read error:");
        close(fd); //关闭事件
        delete_event(epollfd, fd, EPOLLIN);
    }
    else if (nread == 0)
    {
        fprintf(stderr, "client close.\n");
        close(fd);
        delete_event(epollfd, fd, EPOLLIN);
    }
    else
    {
        printf("read message is %s", buf);
        modify_event(epollfd, fd, EPOLLOUT);
    }
}

void do_write(int epollfd, int fd, char *buf)
{
    int nwrite;
    nwrite = write(fd, buf, strlen(buf));
    if (nwrite == -1)
    {
        perror("write error:");
        close(fd);
        delete_event(epollfd, fd, EPOLLOUT);
    }
    else
    {
        modify_event(epollfd, fd, EPOLLIN);
    }
    memset(buf, 0, MAXSIZE); //清空buff
}

void add_event(int epollfd, int fd, int state)
{
    struct epoll_event ev;
    ev.events = state;//监听读或写事件
    ev.data.fd = fd;//epoll所监视的套接字
    epoll_ctl(epollfd, EPOLL_CTL_ADD, fd, &ev);
}

void modify_event(int epollfd, int fd, int state)
{
    struct epoll_event ev;
    ev.events = state;
    ev.data.fd = fd;
    epoll_ctl(epollfd, EPOLL_CTL_MOD, fd, &ev);
}

void delete_event(int epollfd, int fd, int state)
{
    struct epoll_event ev;
    ev.events = state;
    ev.data.fd = fd;
    epoll_ctl(epollfd, EPOLL_CTL_DEL, fd, &ev);
}