go map 的复习

重温 go map 数据结构相关设计时，顿觉老矣，很多内容都模糊不清了，写一遍加深印象

版本：go 1.18.1

主要源码信息 src/runtime/map.go

const (
	// Maximum number of key/elem pairs a bucket can hold.
	bucketCntBits = 3
	bucketCnt     = 1 << bucketCntBits

	// Maximum average load of a bucket that triggers growth is 6.5.
	// Represent as loadFactorNum/loadFactorDen, to allow integer math.
	loadFactorNum = 13
	loadFactorDen = 2

	// Maximum key or elem size to keep inline (instead of mallocing per element).
	// Must fit in a uint8.
	// Fast versions cannot handle big elems - the cutoff size for
	// fast versions in cmd/compile/internal/gc/walk.go must be at most this elem.
	maxKeySize  = 128
	maxElemSize = 128
	
	// data offset should be the size of the bmap struct, but needs to be
	// aligned correctly. For amd64p32 this means 64-bit alignment
	// even though pointers are 32 bit.
	dataOffset = unsafe.Offsetof(struct {
		b bmap
		v int64
	}{}.v)
	
	...
)
// A header for a Go map.
type hmap struct {
	// Note: the format of the hmap is also encoded in cmd/compile/internal/reflectdata/reflect.go.
	// Make sure this stays in sync with the compiler's definition.
	count     int // # live cells == size of map.  Must be first (used by len() builtin)
	flags     uint8 
	B         uint8  // log_2 of # of buckets (can hold up to loadFactor * 2^B items)
	noverflow uint16 // approximate number of overflow buckets; see incrnoverflow for details
	hash0     uint32 // hash seed

	buckets    unsafe.Pointer // array of 2^B Buckets. may be nil if count==0.
	oldbuckets unsafe.Pointer // previous bucket array of half the size, non-nil only when growing
	nevacuate  uintptr        // progress counter for evacuation (buckets less than this have been evacuated)

	extra *mapextra // optional fields
}

// mapextra holds fields that are not present on all maps.
type mapextra struct {
	// If both key and elem do not contain pointers and are inline, then we mark bucket
	// type as containing no pointers. This avoids scanning such maps.
	// However, bmap.overflow is a pointer. In order to keep overflow buckets
	// alive, we store pointers to all overflow buckets in hmap.extra.overflow and hmap.extra.oldoverflow.
	// overflow and oldoverflow are only used if key and elem do not contain pointers.
	// overflow contains overflow buckets for hmap.buckets.
	// oldoverflow contains overflow buckets for hmap.oldbuckets.
	// The indirection allows to store a pointer to the slice in hiter.
	overflow    *[]*bmap
	oldoverflow *[]*bmap

	// nextOverflow holds a pointer to a free overflow bucket.
	nextOverflow *bmap
}

// A bucket for a Go map.
type bmap struct {
	// tophash generally contains the top byte of the hash value
	// for each key in this bucket. If tophash[0] < minTopHash,
	// tophash[0] is a bucket evacuation state instead.
	tophash [bucketCnt]uint8
	// Followed by bucketCnt keys and then bucketCnt elems.
	// NOTE: packing all the keys together and then all the elems together makes the
	// code a bit more complicated than alternating key/elem/key/elem/... but it allows
	// us to eliminate padding which would be needed for, e.g., map[int64]int8.
	// Followed by an overflow pointer.
}

简要描述

map实际是由 hash 表（hmap）实现，结构中的 buckets 数组用来实际存储数据，每个bucket（bmap）能存 bucketCnt 组 key/value 数据，默认为 8

一些重点

bmap 在编译时会动态创建为新结构，通过指针地址操作（dataOffset）

type bmap struct {
    topbits  [8]uint8 // tophash，hash值的高8位
    keys     [8]keytype // 连续存放 8 个key
    elems    [8]valuetype // 连续存放 8 个value
    pad      uintptr // 内存对齐填充
    overflow uintptr // 溢出桶指针，指向一个新的桶，链表法解决hash冲突
}

hash0 通过 fastrand() 在创建时（makemap）随机
初始化 map 时，会根据 key 类型选取不同的 hash 方法，提升效率
- slice 不可比较，所以不能作为 key（可以 hash，但即使定位到 bucket 也无法匹配到 key）
- func,map,slice 或内部包含这三种类型的array、struct类型，均不可比较
bmap中，key/value 是单独放到一起的，的目的是为了节省空间（内存对齐），如源码注释中提到的map[int64]int8
通过对 key hash 值低位（hmap.B 位）对应的 bucket 的数组下标
key hash 值的高8位存储在 bucket 中的 tophash 数组中，查找时遍历比较，如果没找到且 overflow 不为空，还要继续去 overflow 的 bucket 中寻找，直至找到或 overflow 为 nil
hmap.flags 包含写状态，如果检测到并发读写，会panic
hmap.extra 是为让 gc 不扫描此类 bucket，提升效率
- 只要 bmap 内不含指针就不需gc扫描
- bucket 内若只有 overflow 是指针（k/v 不含指针），则 bmap 的生成函数会将 overflow 类型转为uintptr，并使用 mapextra 引用该桶，既避免了 bucket 的 gc 扫描，又保证其 overflow 桶存活（gc不会把 uintptr 当作指针）
写操作时，如果 hash 低位落在同一个 bucket 中的 key 过多，超过8个，就会由 overflow bucket 承接，即通过拉链法来解决 hash 冲突，否则找一个空位置存入 tophash 以及相应的 k/v
写操作如果需要增加新 bucket 时需要扩容，只记录，具体执行会分散到写操作和删除操作中渐进进行，避免迁移迁移成本过高
对于大的 k/v （maxKeySize/maxElemSize）也会转为指针，方便内存对齐以及控制桶的整体大小
map 不能对其值取地址
- 因为map内部有渐进式扩容，所以map的值地址不固定，取地址没有意义
- 对于值类型为 slice 或 struct，不能直接操作其内部元素, 只有把他们各自当做整体去赋值操作才是安全的
遍历 map 时，先调用 mapiterinit 初始化用于遍历的 hiter 结构体, 会随机定位出一个起始遍历的桶hiter.startBucket, 这也是 map 遍历无序的原因
删除 key 时，只是标记为 empty，并未释放空间（k/v 如果含指针，gc自行处理），需要将整个 map 置为 nil 才能完成释放，大概是为了遍历时不发生异常吧
- 这个行为跟版本、对象分配是在堆或栈上有关，比如1.18.1中，map[int]int，局部变量 delete key 后，执行 gc 会释放内存（），而 map 使用全局变量时，只有置为 nil 后 gc 才会释放内存
- 因为 GC 只针对堆上的内存
  - 堆上的内存自行分配释放（gc）
  - 栈的内存是由编译器自动进行分配和释放
- 引用类型的全局变量内存分配在堆上，值类型的全局变量分配在栈上；但局部变量内存分配可能在栈上也可能在堆上

扩容（hmap.B+1）方式

负载因子超过默认值（biggerSizeGrow）
- 6.5，即每个bucket平均存储6.5个k/v
overflow的bucket数量过多（sameSizeGrow）
- 元素不断的进行增删造成溢出桶很多，效率降低，所以需要回收多余的overflow桶
- 借助hmap.noverflow来判断溢出桶是否过多
  - hmap.B<=15 时，判断是溢出桶是否多于桶数 1<<hmap.B，否则只判断溢出桶是否多于 1<<15
  - 这也就可以解释为什么当hmap.noverflow值接近1<<15 - 1时为近似值, 只要可以评估是否溢出桶过多不合理就行了
    参考
    原文链接

简要描述

一些重点

扩容（hmap.B+1）方式

参考