浅谈Redis对象和数据类型

Redis对象

下文中的redis源码选自Redis7.4

Redis定义了多种数据类型（SDS, linkedlist, dict(也叫hashtable), skiplist, intset, ziplist, listpack）和多种对象类型（String, List, Hash, Set, Sortedset），其中每种对象类型的底层至少对应一种数据类型
String
- int - 保存整数值并且该整数值的范围在long内
- 简单动态字符串（SDS），预分配空间来减少内存的频繁分配；容量小于1M之前，扩容都是翻倍，超过1M之后，每次只增加1M；最大长度为512M
List
- 默认底层是quicklist(linkedlist + listpack)
- 特殊情况下是linkedlist
Hash
- ziplist：键值对数量少并且每个键值对的字符串长度小
- dict(hashtable / compact hashtable) : 数组 + 链表
- 渐进式rehash
Set
- 无序
- intset：当集合保存的元素数量很少（512）并且全部都是整数值
- hashtable：value全部为NULL
Sortedset - 容器型
- 有序(默认升序)， value:score
- zset: skiplist + dict

Redis数据结构

SDS

在Redis中，可能会被修改的字符串类型都用它自定义的一个结构SDS来描述，只有一定不会更改的字面量才会使用C字符串

SDS(simple dynamic string,简单动态字符串)的定义包括当前字符串长度、已分配的空间大小、标志位以及字符串本身

// src/sds.h
struct __attribute__ ((__packed__)) sdshdr16 {
  uint16_t len; /* used */
  uint16_t alloc; /* excluding the header and null terminat or */
  unsigned char flags; /* 3 lsb of type, 5 unused bits */
  char buf[];
};

SDS的优点（相比于C字符串）
- 得到字符串长度不需要遍历字符串，将时间复杂度从O(N)变为O(1)
- 二进制安全
  - 由于字符串的长度已经通过len来记录，因此不需要从\0来判断字符串是否结束，故字符串类型可以存储二进制数据
- 空间预分配和惰性空间释放
  - 空间预分配：在扩展SDS空间之前，会先判断剩余空间是否足够，如果足够则直接使用；否则再判断所需空间是否大于1MB，不是则多分配和所需空间一样大小的未使用空间（例如所需空间是16B，则最终分配16B（将要被使用）+16B（不使用）+1B（字符串结束符号\0）= 33B大小的空间），是则多分配1MB空间（例如所需空间3MB，则最终分配 3MB + 1MB + 1B 大小的空间）。
  - 惰性空间释放：当字符串所使用的空间变少之后（例如删除某个字符），不会立刻回收该字符串所占据的空间，而是将其归到not use中。

Linkedlist

链表的底层是Linkedlist，该类型的数据结构是一个双向链表。

// src/adlist.h
typedef struct listNode {
  struct listNode *prev;
  struct listNode *next;
  void *value;
} listNode;

typedef struct list {
  listNode *head;
  listNode *tail;
  void *(*dup)(void *ptr);
  void (*free)(void *ptr);
  int (*match)(void *ptr, void *key);
  unsigned long len;
} list;

链表节点包含该节点的值、prev指针和next指针。
链表结构包含指向头节点和尾节点的指针、链表长度、复制链表的函数、释放链表节点的函数以及匹配节点的函数

Dict(HashTable)

Hash的底层数据结构是哈希表。存储键值对时，会先根据键计算哈希值(siphash)和索引，然后将值存放到哈希表中对应索引处。

计算哈希值:

hash = dictHashKey(d, key, d->useStoredKeyApi);
static inline uint64_t dictHashKey(dict *d, const void *key, int isStoredKey) {
  if (isStoredKey && d->type->storedHashFunction)
    return d->type->storedHashFunction(key);
  else
    return d->type->hashFunction(key);
}

计算索引： idx = hash & DICTHT_SIZE_MASK(d->ht_size_exp[0]);

Hash的源码如下

// src/dict.h
struct dict {
  dictType *type;

  dictEntry **ht_table[2];
  unsigned long ht_used[2];

  long rehashidx; /* rehashing not in progress if rehashidx == -1 */

  /* Keep small vars at end for optimal (minimal) struct padding */
  unsigned pauserehash : 15; /* If >0 rehashing is paused */

  unsigned useStoredKeyApi : 1; /* See comment of storedHashFunction above */
  signed char ht_size_exp[2]; /* exponent of size. (size = 1<<exp) */
  int16_t pauseAutoResize;  /* If >0 automatic resizing is disallowed (<0 indicates coding error) */
  void *metadata[];
};

type指向一系列回调函数，可以根据当前类型来设定不同的函数，例如type->hashFunction()可以计算对应的哈希值

ht_table是一个包含两个dictEntry指针的数组（用于渐进式rehash）,dictEntry结构体中的next指针指向下一个节点，该节点和当前节点拥有相同的哈希值

// src/dict.c
struct dictEntry {
  void *key;
  union {
    void *val;
    uint64_t u64;
    int64_t s64;
    double d;
  } v;
  struct dictEntry *next;     /* Next entry in the same hash bucket. */
};

ht_used对应两个ht_table，分别表示每个哈希表当前存储的键值对的数量
rehashidx表示rehash的进度，当其值为-1时表示没有在rehash

哈希冲突
- 当有两个或以上的键被分配到了同一个索引上面，就会出现哈希冲突
- 通过链表来解决冲突。每个索引上的节点dictEntry都有一个next字段，可以指向下一个节点。并且由于不是双向链表，插入该节点是头插法
  1
  2
  3
  4
  5
  6
  dictEntry *entry;
  entry = zmalloc(sizeof(*entry));
  entry->key = key;
  entry->next = *bucket;
  *bucket = entry;
  d->ht_used[htidx]++;
渐进式rehash - 当哈希表的负载因子超出合适的范围时，需要对哈希表进行扩容或者缩容
- 根据需要给ht[1]分配合适的大小（扩容，则分配第一个大于等于ht[0].used * 2的2的次方幂；缩容，则分配第一个大于等于ht[0].used的2的次方幂）、
- 重新计算ht[0]中每个键的哈希和索引，将其键值对放到ht[1]上（在此过程中，对该哈希表的增只对ht[1]进行操作，删改查则需要对ht[0]和ht[1]都进行操作）
- 迁移完后，ht[0]变为ht[1],ht[1]变为ht[0]

skiplist

skiplist是有序集合SortedSet选择zset作为编码时的底层实现，可以理解为多层链表。最高层的节点最少，最低层包含所有节点。

skiplist相关的源码如下

// src/server.h
typedef struct zskiplistNode {
  sds ele; // 具体数据
  double score; // 分数
  struct zskiplistNode *backward;
  // 层级数组
  struct zskiplistLevel {
      struct zskiplistNode *forward;
      unsigned long span; // 跨度
  } level[];
} zskiplistNode;

typedef struct zskiplist {
  struct zskiplistNode *header, *tail; // 头结点和尾节点
  unsigned long length; // 节点数量
  int level; // 层数
} zskiplist;

typedef struct zset {
  dict *dict;
  zskiplist *zsl;
} zset;
// zset相关的api操作在 src/t_zset.c
// 比较重要的有zslCreate、zslFree、zslGetRank

通过建立索引来提高查找元素的效率，即空间换时间
zsl
- 在同一个跳跃表中，各个节点保存的成员对象必须是唯一的，但是多个节点保存的分值可以相同，分值相同节点会按照成员对象在字典序中的大小来进行排序，较小的靠前。
- 插入节点时，该节点的层数通过zslRandomLevel获取

ziplist（Redis7.0后被listpack替代）

ziplist早期是列表List的底层实现之一，当列表键只包含少量列表项，并且每个列表项是小整数或者是短字符串的时候，Redis就会使用ziplist作为底层实现

// src/ziplist.h 7.4版本已经优化了，优化后如下
/* Each entry in the ziplist is either a string or an integer. */
typedef struct {
  /* When string is used, it is provided with the length (slen). */
  unsigned char *sval;
  unsigned int slen;
  /* When integer is used, 'sval' is NULL, and lval holds the value. */
  long long lval;
} ziplistEntry;

// 5.0版本
typedef struct zlentry {
  unsigned int prevrawlensize; /* Bytes used to encode the previous entry len*/
  unsigned int prevrawlen;     /* Previous entry len. */
  unsigned int lensize;        /* Bytes used to encode this entry type/len.
                                  For example strings have a 1, 2 or 5 bytes
                                  header. Integers always use a single byte.*/
  unsigned int len;            /* Bytes used to represent the actual entry.
                                  For strings this is just the string length
                                  while for integers it is 1, 2, 3, 4, 8 or
                                  0 (for 4 bit immediate) depending on the
                                  number range. */
  unsigned int headersize;     /* prevrawlensize + lensize. */
  unsigned char encoding;      /* Set to ZIP_STR_* or ZIP_INT_* depending on
                                  the entry encoding. However for 4 bits
                                  immediate integers this can assume a range
                                  of values and must be range-checked. */
  unsigned char *p;            /* Pointer to the very start of the entry, that
                                  is, this points to prev-entry-len field. */
} zlentry;

ziplist是一个连续的数组，其结构如下

zlbytes zltail zllen entry1 entry2 … zlend
- zlbytes - 4字节，记录整个ziplist占用的字节数
- zltail - 4字节，记录最后一个entry的偏移
- zllen - 2字节，记录entry个数
- entry - 存储具体的数据
- zlend（0xFF） - 1字节，标记ziplist的结束
缺陷 - 连锁更新
- 由于早期entry保存了prevlen（如果前一个entry小于254字节，则使用1字节记录prevlen；否则用5个字节记录prevlen）
- 假设entry1\entry2\…\entryn都是253字节，修改entry1后它变成了254字节，那么后续所有的entry都需要更改
改进 - listpack（Redis7.0版本后用listpack代替了ziplist）

intset

intset是集合Set的底层实现之一，其特点是有序且不重复，相关源码如下

typedef struct intset {
  uint32_t encoding; // encoding的值为INTSET_ENC_INT16或者INTSET_ENC_INT32或者INTSET_ENC_INT64
  uint32_t length;
  int8_t contents[];
} intset;

当需要新插入一个元素，并且该元素的类型比当前集合中所有元素的类型都大时，需要先将整个集合升级，把原来的元素移动到合适的位置后（逆序移动），再插入新的元素
- 升级的优点：避免类型错误（不把不同的元素类型放在同一个数组中）；在需要的时候再升级，节约了内存
- 升级后不能降级

listpack

listpack取代了ziplist，作为list的底层实现之一。源码如下

typedef struct {
  /* When string is used, it is provided with the length (slen). */
  unsigned char *sval;
  uint32_t slen;
  /* When integer is used, 'sval' is NULL, and lval holds the value. */
  long long lval;
} listpackEntry;

listpack的内存布局如下

totalBytes numElements element1 element2 … listEnd
- totaltbytes - 4字节，表示listpack总长度
- numElements - 2字节，表示当前元素数量
- listEnd（0xFF） - 1字节，标识当前listpack结束
- element的结构如下(encode-type + data + element-len)
  - encodetype - 编码方案
  - data - 具体数据
  - element-len - 前两个条目占用的字节数，主要用于从右往左遍历