Redis内部数据结构详解之ziplist

程序员文章站 2023-12-31 12:33:40

...

本文所引用的源码全部来自Redis2.8.2版本。 Redis中ziplist数据结构与API相关文件是：ziplist.h, ziplist.c, t_zset.c。一、ziplist的构成 zlbyteszltailzllenentryentryzlend zlbytes是一个4字节无符号整数，用来存储整个ziplist占用的字节数； zltail是一

本文所引用的源码全部来自Redis2.8.2版本。

Redis中ziplist数据结构与API相关文件是：ziplist.h, ziplist.c, t_zset.c。

一、ziplist的构成

是一个4字节无符号整数，用来存储整个ziplist占用的字节数；

是一个4字节无符号整数，用来存储ziplist最后一个节点的相对于ziplist首地址偏移量；

是一个2字节无符号整数，存储ziplist中节点的数目，最大值为(2^16 - 2)，当zllen大于最大值时，需要遍历整个ziplist才能获取ziplist节点的数目；

ziplist存储的节点，各个节点的字节数根据内容而定；

是一个1字节无符号整数，值为255(11111111)，作为ziplist的结尾符

二、ziplist宏模块

宏名称	作用
ZIPLIST_BYTES(zl)	获取zlbytes值
ZIPLIST_TAIL_OFFSET(zl)	获取zltail值
ZIPLIST_LENGTH(zl)	获取zllen值
ZIPLIST_HEADER_SIZE	获取ziplist header的长度，固定为10个字节
ZIPLIST_ENTRY_HEAD(zl)	获取ziplist第一个entry的首地址
ZIPLIST_ENTRY_TAIL(zl)	获取ziplist最后一个entry的首地址
ZIPLIST_ENTRY_END(zl)	获取ziplist的末端，即zlend首地址

三、ziplist典型结构分布图

Redis内部数据结构详解之ziplist

图中相关域以及宏的解析见上面的分析。

四、entry结构解析

Redis内部数据结构详解之ziplist

prev_entry_bytes_length：

表示上个节点所占的字节数，即上个节点的长度，如果需要跳到上个节点，而已知道当前节点的首地址p,上个节点的首地址prev = p-prev_entry_bytes_length

根据编码方式的不同，prev_entry_bytes_length可能占1 bytes或5 bytes：

1 bytes：如果上个节点的长度小于254，那么就只需要1个字节；

5 bytes：如果上个节点的长度大于等于254，那么就将第一个字节设为254(1111 1110),然后接下来的4个字节保存实际的长度值；

encoding与length：

ziplist的编码类型分为字符串、整数
encoding的前两个比特位用来判断编码类型是字符串或整数：00, 01, 10表示contents中保存着字符串
11表示contents中保存着整数

字符串的具体编码方式：(_:预留，a:实际的二进制数)

编码	编码长度	contents中的值
00aaaaaa	1 bytes	长度[0,63]个字节的字符串
01aaaaaa bbbbbbbb	2 bytes	长度[64,16383]个字节的字符串
01______ aaaaaaaa bbbbbbbb cccccccc dddddddd	5 bytes	长度[16384,2^32-1]个字节的字符串

Redis内部数据结构详解之ziplist

11开头的整型编码方式：

编码	编码长度	contents中的值
1100 0000	1 bytes	int16_t类型整数
1101 0000	1 bytes	int32_t类型整数
1110 0000	1 bytes	int64_t类型整数
1111 0000	1 bytes	24 bit 有符号整数
1111 1110	1 bytes	8 bit 有符号整数
1111 xxxx	1 bytes	4 bit 无符号整数,[0,12]

解释一下1111 xxxx编码：1111 xxxx 首先最小值应该是0001（0000已经被占用），最大值应该是1101（1110与111 1均已经被占用），因此，可被编码的值实际上只能是 1 至 13，由于还需要减1，所以实际只能编码[0,12]，至于减1的理由，我的理解是方便编码0。

Redis内部数据结构详解之ziplist

五、zlentry数据结构

typedef struct zlentry {
    //前一个节点长度的存储所占的字节数，上个节点占用的长度
    unsigned int prevrawlensize, prevrawlen;
    //当前节点长度的存储所占的字节数，当前节点占用的长度
    unsigned int lensize, len;
    unsigned int headersize;//当前节点的头部大小
    unsigned char encoding;//当前链表结点长度（即字段len）使用的编码类型
    unsigned char *p; //指向当前结点起始位置的指针
} zlentry;

zlentry结构体就是用来存储当前节点的相关信息的，这些信息需要解析得到，解析的代码如下：

//判断是否为字符串编码
#define ZIP_ENTRY_ENCODING(ptr, encoding) do {  \
    (encoding) = (ptr[0]); \
    if ((encoding) 

六、ziplist的主要函数列表

	
		


			函数名
			
			
			作用
			
			
			复杂度
			
			


			ziplistNew
			
			
			创建一个新的ziplist
			
			
			O(1)
			
			


			ziplistResize
			
			
			重新调整ziplist的大小
			
			
			O(1)
			
			


			__ziplistCascadeUpdate
			
			
			级联更新ziplist中entry的大小
			
			
			O(N*M)
			
			


			__ziplistDelete
			
			
			删除指定位置开始的N个节点
			
			
			O(N*M)
			
			


			__ziplistInsert
			
			
			在指定位置之前插入一个节点
			
			
			O(N*M)
			
			


			ziplistIndex
			
			
			获取第N个节点的首地址
			
			
			O(N)
			
			


			ziplistNext
			
			
			获取当前节点的下一个节点首地址
			
			
			O(1)
			
			


			ziplistPrev
			
			
			获取当前节点的前一个节点首地址
			
			
			O(1)
			
			


			ziplistGet
			
			
			获取给定节点的数据
			
			
			O(1)
			
			


			ziplistFind
			
			
			找到ziplist中包含给定数据的节点
			
			
			O(N)
			
			


			ziplistLen
			
			
			获取ziplist中节点的数目
			
			
			O(N)
			
			


			ziplistBlobLen
			
			
			以字节为单位，返回ziplist占用的内存大小
			
			
			O(1)
			
			


	
		
			
				
					 
			
		
	

七、另外三个简单而重要的函数
 
//p:当前节点首地址，len:上一个节点的长度值
//如果p==NULL，则返回编码len需要多少个字节，否则将该len存储在当前节点的prev_entry_bytes_length区域
static unsigned int zipPrevEncodeLength(unsigned char *p, unsigned int len) {
    if (p == NULL) {
        return (len 
//计算出节点所占的总字节数
static unsigned int zipRawEntryLength(unsigned char *p) {
    unsigned int prevlensize, encoding, lensize, len;
    ZIP_DECODE_PREVLENSIZE(p, prevlensize);//得到编码前一个节点长度的字节数
    //得到存储当前节点的长度的字节数，当前节点数据的长度
    //注意保存字符串与整数之间的差别
    ZIP_DECODE_LENGTH(p + prevlensize, encoding, lensize, len);
    return prevlensize + lensize + len;
}
/* Return the difference in number of bytes needed to store the length of the
 * previous element 'len', in the entry pointed to by 'p'. */
//计算需要存储len所需字节数与当前节点p的prev_entry_bytes_length的差值
static int zipPrevLenByteDiff(unsigned char *p, unsigned int len) {
    unsigned int prevlensize;
    ZIP_DECODE_PREVLENSIZE(p, prevlensize);
    return zipPrevEncodeLength(NULL, len) - prevlensize;
}
 
八、__ziplistCascadeUpdate、__ziplistDelete、__ziplistInsert函数详解


注；以下注释与代码中的一些字段请参照zlentry数据结构中的定义，这三个函数是ziplist一切操作的核心，尤其是在memmove进行数据移动的时候更需要多思考，在阅读下面的代码之前先阅读ziplistResize函数，而且需要对C语言中的realloc重新分配内存函数需要有一定的了解。
 
/**
 * 当将一个新节点添加到某个节点之前的时候，如果原节点的prevlen不足以保存新节点的长度，
 * 那么就需要对原节点的空间进行扩展（从 1 字节扩展到 5 字节）。
 *
 * 但是，当对原节点进行扩展之后，原节点的下一个节点的 prevlen 可能出现空间不足，
 * 这种情况在多个连续节点的长度都接近 ZIP_BIGLEN 时可能发生。
 *
 * 这个函数就用于处理这种连续扩展动作。
 *
 * 因为节点的长度变小而引起的连续缩小也是可能出现的，不过，为了避免扩展-缩小-扩展-缩小这样的情况反复出现（flapping，抖动），
 * 我们不处理这种情况，而是任由 prevlen 比所需的长度更长
 *
 * 复杂度：O(N^2)
 *
 * 返回值：更新后的 ziplist
 * zl: ziplist首地址，p:需要扩展prevlensize的节点首地址
 */
static unsigned char *__ziplistCascadeUpdate(unsigned char *zl, unsigned char *p) {
    size_t curlen = intrev32ifbe(ZIPLIST_BYTES(zl)), rawlen, rawlensize;
    size_t offset, noffset, extra;
    unsigned char *np;
    zlentry cur, next;

    while (p[0] != ZIP_END) {
        cur = zipEntry(p);
        rawlen = cur.headersize + cur.len; //整个entry的字节数
        rawlensize = zipPrevEncodeLength(NULL,rawlen); //存储rawlen需要的字节数

        /* Abort if there is no next entry. */
        if (p[rawlen] == ZIP_END) break;// 已经到达表尾，退出
        next = zipEntry(p+rawlen);//得到下一个节点的zlentry

        /* Abort when "prevlen" has not changed. */
        // 如果下一的prevlen等于当前节点的rawlen，那么说明编码大小无需改变，退出
        if (next.prevrawlen == rawlen) break;

        // 下一节点的长度编码空间不足，进行扩展
        if (next.prevrawlensize  rawlensize) {
                /* This would result in shrinking, which we want to avoid.
                 * So, set "rawlen" in the available bytes. */
                zipPrevEncodeLengthForceLarge(p+rawlen,rawlen);
            } else {
                zipPrevEncodeLength(p+rawlen,rawlen);
            }

            /* Stop here, as the raw length of "next" has not changed. */
            break;//后面的节点不用扩展
        }
    }
    return zl;
}
 
/* Delete "num" entries, starting at "p". Returns pointer to the ziplist. */
//从指针 p 开始，删除 num 个节点
static unsigned char *__ziplistDelete(unsigned char *zl, unsigned char *p, unsigned int num) {
    unsigned int i, totlen, deleted = 0;
    size_t offset;
    int nextdiff = 0;
    zlentry first, tail;

    first = zipEntry(p); //删除的首个节点
    for (i = 0; p[0] != ZIP_END && i  0) {
        if (p[0] != ZIP_END) {
            /* Storing `prevrawlen` in this entry may increase or decrease the
             * number of bytes required compare to the current `prevrawlen`.
             * There always is room to store this, because it was previously
             * stored by an entry that is now being deleted. */
            //计算删除的第一个节点first的prevrawlensize与p节点prevrawlensize的差值
            nextdiff = zipPrevLenByteDiff(p,first.prevrawlen);
            p -= nextdiff; //根据nextdiff值，对p进行向前或向后偏移，留取的字节来保存first.prevrawlen
            zipPrevEncodeLength(p,first.prevrawlen);//将first.prevrawlen值存储在p的prevrawlensize中

            /* Update offset for tail */
            ZIPLIST_TAIL_OFFSET(zl) =
                intrev32ifbe(intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl))-totlen);

            /* When the tail contains more than one entry, we need to take
             * "nextdiff" in account as well. Otherwise, a change in the
             * size of prevlen doesn't have an effect on the *tail* offset. */
            //如果p不是尾节点，那么尾节点指针的首地址还需要加上nextdiff
            tail = zipEntry(p);
            if (p[tail.headersize+tail.len] != ZIP_END) {
                ZIPLIST_TAIL_OFFSET(zl) =
                   intrev32ifbe(intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl))+nextdiff);
            }

            /* Move tail to the front of the ziplist */
            //first.p至p之间的节点都是需要删除的，因此需要将p开始的数据向前偏移，zlend不需要处理，因此需要-1
            memmove(first.p,p,intrev32ifbe(ZIPLIST_BYTES(zl))-(p-zl)-1);
        } else {
            /* The entire tail was deleted. No need to move memory. */
            //如果已经删除到zlend，那么尾节点指针应该指向被删除的first之前的节点首地址
            ZIPLIST_TAIL_OFFSET(zl) =
                intrev32ifbe((first.p-zl)-first.prevrawlen);
        }

        /* Resize and update length */
        offset = first.p-zl;
        zl = ziplistResize(zl, intrev32ifbe(ZIPLIST_BYTES(zl))-totlen+nextdiff);
        ZIPLIST_INCR_LENGTH(zl,-deleted);
        p = zl+offset;

        /* When nextdiff != 0, the raw length of the next entry has changed, so
         * we need to cascade the update throughout the ziplist */
        /**
            如果nextdiff不等于0，说明现在的p节点的长度变了，需要级联更新下个节点能否保存
            p节点的长度值
        */
        if (nextdiff != 0)
            zl = __ziplistCascadeUpdate(zl,p);
    }
    return zl;
}
 
/* Insert item at "p". */
//添加保存给定元素s的新节点插入到地址p之前，然后原有的数据向后偏移
//zl: ziplist首地址，p:插入位置指针，s:待插入的字符串首地址，slen:待插入字符串长度
static unsigned char *__ziplistInsert(unsigned char *zl, unsigned char *p, unsigned char *s, unsigned int slen) {
    size_t curlen = intrev32ifbe(ZIPLIST_BYTES(zl)), reqlen, prevlen = 0;
    size_t offset;
    int nextdiff = 0;
    unsigned char encoding = 0;
    long long value = 123456789; /* initialized to avoid warning. Using a value
                                    that is easy to see if for some reason
                                    we use it uninitialized. */
    zlentry entry, tail;

    /* Find out prevlen for the entry that is inserted. */
    // 那么取出节点相关资料，以及 prevlen
    if (p[0] != ZIP_END) {//p之后存在节点
        entry = zipEntry(p);//取出p节点的相关资料
        prevlen = entry.prevrawlen;//得到p节点前一个节点所占字节数
    } else {//p之后没有节点，达到zlend,那么就取出尾节点
        unsigned char *ptail = ZIPLIST_ENTRY_TAIL(zl);
        if (ptail[0] != ZIP_END) {//如果存在尾节点
            prevlen = zipRawEntryLength(ptail);//得到尾节点的总字节数，然后在尾节点之后insert
        }//否则就应该是在一个空的ziplist第一个insert一个节点
    }

    /* See if the entry can be encoded */
    // 查看能否将新值保存为整数，如果可以的话返回 1 ，
    // 并将新值保存到 value ，编码形式保存到 encoding
    if (zipTryEncoding(s,slen,&value,&encoding)) {
        /* 'encoding' is set to the appropriate integer encoding */
        //s 可以保存为整数，那么继续计算保存它所需的空间
        reqlen = zipIntSize(encoding);
    } else {
        /* 'encoding' is untouched, however zipEncodeLength will use the
         * string length to figure out how to encode it. */
        // 不能保存为整数，直接使用字符串长度
        reqlen = slen;
    }
    /* We need space for both the length of the previous entry and
     * the length of the payload. */
    // 计算编码 prevlen 所需的长度
    reqlen += zipPrevEncodeLength(NULL,prevlen);
    //计算编码slen所需的长度
    reqlen += zipEncodeLength(NULL,encoding,slen);

    /* When the insert position is not equal to the tail, we need to
     * make sure that the next entry can hold this entry's length in
     * its prevlen field. */
    //当插入的位置不为尾部时，需要确保下一个节点的存储前一个
    //节点所占自己数的空间能够存储即将插入节点的长度
    // zipPrevLenByteDiff 的返回值有三种可能：
    // 1）新旧两个节点的编码长度相等，返回 0
    // 2）新节点编码长度 > 旧节点编码长度，返回 5 - 1 = 4
    // 3）旧节点编码长度 > 新编码节点长度，返回 1 - 5 = -4
    nextdiff = (p[0] != ZIP_END) ? zipPrevLenByteDiff(p,reqlen) : 0;

    /* Store offset because a realloc may change the address of zl. */
    offset = p-zl;//保存当前的偏移量，在这偏移量之前的数据不需要改变，只需要改变在此之后的数据
    // 重分配空间，并更新长度属性和表尾
    // 新空间长度 = 现有长度 + 新节点所需长度 + 编码新节点长度所需的长度差
    zl = ziplistResize(zl,curlen+reqlen+nextdiff);
    p = zl+offset;

    /* Apply memory move when necessary and update tail offset. */
    if (p[0] != ZIP_END) {
        /* Subtract one because of the ZIP_END bytes */
        //将原有从p-nextdiff开始全部向后偏移，余留出reqlen保存即将insert的数据
        /**
            nextdiff = -4:原来p有5个字节来存储上个节点的长度，而现在只需要1个，
                          因此只需要将p+4后面的字节偏移到p+reqlen即可，这样就
                          只保留1个字节保存reqlen的长度了
            nextdiff = 4: 原来p只有1个字节来存储上个节点的长度，现在需要5个，
                          那就将p-4后面的字节偏移到p+reqlen，这样p原来有1个字节
                          加上多偏移来的4个字节就可以保存reqlen的长度了
            nextdiff = 0: 不需要考虑
        */
        memmove(p+reqlen,p-nextdiff,curlen-offset-1+nextdiff);

        /* Encode this entry's raw length in the next entry. */
        zipPrevEncodeLength(p+reqlen,reqlen);//下个节点保存即将insert数据的所占字节数

        /* Update offset for tail */
        ZIPLIST_TAIL_OFFSET(zl) =
            intrev32ifbe(intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl))+reqlen);

        /* When the tail contains more than one entry, we need to take
         * "nextdiff" in account as well. Otherwise, a change in the
         * size of prevlen doesn't have an effect on the *tail* offset. */
        // 有需要的话，将 nextdiff 也加上到 zltail 上
        tail = zipEntry(p+reqlen);
        if (p[reqlen+tail.headersize+tail.len] != ZIP_END) {
            ZIPLIST_TAIL_OFFSET(zl) =
                intrev32ifbe(intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl))+nextdiff);
        }
    } else {
        /* This element will be the new tail. */
        // 更新 ziplist 的 zltail 属性，现在新添加节点为表尾节点
        ZIPLIST_TAIL_OFFSET(zl) = intrev32ifbe(p-zl);
    }

    /* When nextdiff != 0, the raw length of the next entry has changed, so
     * we need to cascade the update throughout the ziplist */
    /**
        如果nextdiff不等于0，说明现在的p+reqlen节点的长度变了，需要级联更新下个节点能否保存
        p+reqlen节点的长度值
    */
    if (nextdiff != 0) {
        offset = p-zl;
        zl = __ziplistCascadeUpdate(zl,p+reqlen);
        p = zl+offset;
    }

    /* Write the entry */
    p += zipPrevEncodeLength(p,prevlen);//填写保存上一个节点长度的字节数
    p += zipEncodeLength(p,encoding,slen);//填写保存当前节点长度的字节数
    if (ZIP_IS_STR(encoding)) {//保存当前节点的字符串
        memcpy(p,s,slen);
    } else {
        zipSaveInteger(p,value,encoding);//整数
    }
    ZIPLIST_INCR_LENGTH(zl,1);//length + 1
    return zl;
}
ziplist剩余的一些函数就不一一列举分析，理解上述三个函数，其余都很简单，关注这三个函数的重点是作者在realloc之后如何通过memmove保证后续节点的数据保持不变的，不得不说Redis的作者对ziplist的实现很让人佩服。
另外，推荐Redis黄健宏（huangz1990）的Redis设计与实现一书中有关于ziplist插入与删除节点的简单模拟实现。
九、小结
ziplist的最大的好处就是相比skiplist节约大量内存，但是在插入、删除、查询等操作上的时间复杂度与skiplist都是无法比拟的，当保存的数据比较少时，操作的时间自然可以接受，内存就是关键因素。
ziplist在Redis中主要用于Hash Table、List、Sorted Set数据类型小范围数据的存储，本文描述的ziplist的存储都是无序的，如何实现在Sorted Set中的有序存储待以后分析，无序变有序无疑又增加的时间复杂度。
总之，ziplist就是一种用时间换空间的策略。
最后感谢黄健宏（huangz1990）的Redis设计与实现及其他对Redis2.6源码的相关注释对我在研究Redis2.8源码方面的帮助。
由于本文其他有关ziplist的函数没有分析，包括本文内容有问题欢迎评论，有些代码我也不是特别的明白，谢谢。

相关标签： Redis 内部数据结构详解 ziplist 本文所引

Redis内部数据结构详解之ziplist

Redis内部数据结构详解之ziplist

java数据结构排序算法之树形选择排序详解

java数据结构排序算法之归并排序详解

Java描述数据结构学习之链表的增删改查详解

详解java数据结构与算法之双链表设计与实现

Python基础学习之基本数据结构详解【数字、字符串、列表、元组、集合、字典】

Java描述数据结构学习之链表的增删改查详解

Vue源码之关于vm.$delete()/Vue.use()内部原理详解

Python基础学习之基本数据结构详解【数字、字符串、列表、元组、集合、字典】

详解java数据结构与算法之双链表设计与实现

函数名	作用	复杂度
ziplistNew	创建一个新的ziplist	O(1)
ziplistResize	重新调整ziplist的大小	O(1)
__ziplistCascadeUpdate	级联更新ziplist中entry的大小	O(N*M)
__ziplistDelete	删除指定位置开始的N个节点	O(N*M)
__ziplistInsert	在指定位置之前插入一个节点	O(N*M)
ziplistIndex	获取第N个节点的首地址	O(N)
ziplistNext	获取当前节点的下一个节点首地址	O(1)
ziplistPrev	获取当前节点的前一个节点首地址	O(1)
ziplistGet	获取给定节点的数据	O(1)
ziplistFind	找到ziplist中包含给定数据的节点	O(N)
ziplistLen	获取ziplist中节点的数目	O(N)
ziplistBlobLen	以字节为单位，返回ziplist占用的内存大小	O(1)