Go Blog: Go slice internals 学习笔记

Slice internals

A slice is a descriptor of an array segment. It consists of a pointer to the array, the length of the segment, and its capacity (the maximum length of the segment).

一个切片就是一个数组片段的描述符。它(切片)包含了一个指向数组的指针,片段的长度和容量(指这个片段的最大长度)

Our variable s, created earlier by make([]byte, 5), is structured like this:

变量 s, 先前通过 make([]byte, 5) 创建,结构如下:

The length is the number of elements referred to by the slice. The capacity is the number of elements in the underlying array (beginning at the element referred to by the slice pointer). The distinction between length and capacity will be made clear as we walk through the next few examples.

长度就是切片引用的元素数量。容量就是底层数组(从切片指针引用的元素开始,到整个数组末尾)的元素数量。举例说明:

As we slice s, observe the changes in the slice data structure and their relation to the underlying array:

当我们切割 s 时,观察切片 s 的数据结构以及它们和底层数组间关系的变化:

1
s = s[2:4]

Slicing does not copy the slice’s data. It creates a new slice value that points to the original array. This makes slice operations as efficient as manipulating array indices. Therefore, modifying the elements (not the slice itself) of a re-slice modifies the elements of the original slice:

分割并没有拷贝切片的数据。它会创建一个新的 slice,新的 slice 还是指向了原来的数组。这使得 切片操作 就和操作数组索引一样高效。因此,修改一个重新分片后的切片的元素(不是切片本身),也就是在修改原始切片的元素,看例子:

1
2
3
4
5
6
d := []byte{'r', 'o', 'a', 'd'}
e := d[2:] 
// e == []byte{'a', 'd'}
e[1] = 'm'
// e == []byte{'a', 'm'}
// d == []byte{'r', 'o', 'a', 'm'}

Note: e 是通过在切片 d 上执行 切片操作[start:end] 而得到的新切片,但修改 e 的某个元素也会修改原始切片的元素的

Earlier we sliced s to a length shorter than its capacity. We can grow s to its capacity by slicing it again:

早先的时候,我们把 s 切割到了一个比它的容量还小的长度. 我们可以重新切片把它的长度扩展到和容量一样。实例如下:

1
2
3
4
5
// Earlier code is:
s = s[2:4]

// Now 
s = s[:cap(s)]

A slice cannot be grown beyond its capacity. Attempting to do so will cause a runtime panic, just as when indexing outside the bounds of a slice or array. Similarly, slices cannot be re-sliced below zero to access earlier elements in the array.

一个切片不能扩展的超过其容量。如果非要这么做的话,就会导致一个运行时 panic,因为索引越界了。类似地,切片不能在零以下重新切片以访问数组中的早期元素(即:slice 指向数组的指针只能向后移动, 不能向前移动)。

Growing slices(the copy and append functions)

To increase the capacity of a slice one must create a new, larger slice and copy the contents of the original slice into it. This technique is how dynamic array implementations from other languages work behind the scenes. The next example doubles the capacity of s by making a new slice, t, copying the contents of s into t, and then assigning the slice value t to s:

要增加一个切片的容量,就必须创建一个新的,更大的切片并且把原始切片的内容拷贝到新建立的切片中去。这种技术就是其他语言的动态数组的背后实现。下面例子通过创建一个新的切片 t,并将 s 的内容复制到 t,然后将切片值 t 分配给 s 来使 s 的容量加倍:

1
2
3
4
5
6
7
t := make([]byte, len(s), (cap(s)+1) * 2) // +1 in case cap(s) == 0

for i := range s {
    t[i] = s[i]
}

s = t

The looping piece of this common operation is made easier by the built-in copy function. As the name suggests, copy copies data from a source slice to a destination slice. It returns the number of elements copied.

上面这种蝉蛹的循环操作可以使用内建的 copy 函数更加轻松的完成。顾名思义 …

1
func copy(dst, src []T) int

The copy function supports copying between slices of different lengths (it will copy only up to the smaller number of elements). In addition, copy can handle source and destination slices that share the same underlying array, handling overlapping slices correctly.

copy 函数支持在不同长度的切片间进行拷贝(仅支持拷贝小于等于原slice长度的数据)。另外,拷贝函数可以处理共享了底层数组的原切片和目的切片,并且会很好的处理重叠的部分.

Using copy, we can simplify the code snippet above:

1
2
3
t := make([]byte, len(s), (cap(s)+1)*2)
copy(t, s)
s = t

A common operation is to append data to the end of a slice. This function appends byte elements to a slice of bytes, growing the slice if necessary, and returns the updated slice value:

一个经常使用的操作是向slice末尾加数据.下面的函数向 byte slice 里添加字节数据, 如果有必要, 会自动增加容量, 返回更新后的 slice.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
func AppendByte(slice []byte, data ...byte) []byte {
    m := len(slice)
    n := m + len(data)

    if n > cap(slice) { // if necessary, reallocate
        // allocate double what's needed, for future growth.
        newSlice := make([]byte, (n+1)*2)
        copy(newSlice, slice)
        slice = newSlice
    }
    slice = slice[0:n]

    copy(slice[m:n], data)
    return slice
}

One could use AppendByte like this:

1
2
3
4
p := []byte{2, 3, 5}
p = AppendByte(p, 7, 11, 13)

// p == []byte{2, 3, 5, 7, 11, 13}

Functions like AppendByte are useful because they offer complete control over the way the slice is grown. Depending on the characteristics of the program, it may be desirable to allocate in smaller or larger chunks, or to put a ceiling on the size of a reallocation.

像 AppendByte 这样的函数非常有用. 因为它更能对 slice 的扩容有着完整的控制。根据具体的项目, 可能需要分配更小或者更大的空间来获得最佳效果. 或者是对扩容的上限给出限制。

But most programs don’t need complete control, so Go provides a built-in append function that’s good for most purposes; it has the signature

但是多数的项目不需要完整的控制. 所以 Go 提供了 append 函数.它能满足大多数的需求. 签名如下:

1
func append(s []T, x ...T) []T

The append function appends the elements x to the end of the slice s, and grows the slice if a greater capacity is needed.

1
2
3
4
a := make([]int, 1)
// a == []int{0}
a = append(a, 1, 2, 3)
// a == []int{0, 1, 2, 3}

To append one slice to another, use … to expand the second argument to a list of arguments.

1
2
3
4
a := []string{"John", "Paul"}
b := []string{"George", "Ringo", "Pete"}
a = append(a, b...) // equivalent to "append(a, b[0], b[1], b[2])"
// a == []string{"John", "Paul", "George", "Ringo", "Pete"}

Since the zero value of a slice (nil) acts like a zero-length slice, you can declare a slice variable and then append to it in a loop:

因为一个切片的零值(nil) 就好比是长度为 0 的切片,所以你可以声明一个空的 slice 变量,并向里面挨个添加数据.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
// Filter returns a new slice holding only
// the elements of s that satisfy f()
func Filter(s []int, fn func(int) bool) []int {
    var p []int // == nil
    for _, v := range s {
        if fn(v) {
            p = append(p, v)
        }
    }
    return p
}

A possible “gotcha”

可能的 “陷阱”

As mentioned earlier, re-slicing a slice doesn’t make a copy of the underlying array. The full array will be kept in memory until it is no longer referenced. Occasionally this can cause the program to hold all the data in memory when only a small piece of it is needed.

For example, this FindDigits function loads a file into memory and searches it for the first group of consecutive numeric digits, returning them as a new slice.

就像先前提到的,切片操作 并不会创建底层数组的拷贝。整个数组都将被保存在内存中,直到它不再被引用。有时候可能会因为一个小的内存引用导致保存所有的数据。

例如,FindDigits 函数加载整个文件到内存,然后搜索第一个连续的数字,最后结果以切片方式返回。

1
2
3
4
5
6
var digitRegexp = regexp.MustCompile("[0-9]+")

func FindDigits(filename string) []byte {
    b, _ := ioutil.ReadFile(filename) 
    return digitRegexp.Find(b)
}

This code behaves as advertised, but the returned []byte points into an array containing the entire file. Since the slice references the original array, as long as the slice is kept around the garbage collector can’t release the array; the few useful bytes of the file keep the entire contents in memory.

这段代码的行为和描述类似,返回的 []byte 指向保存整个文件的数组。因为切片引用了原始的数组,导致 GC 不能释放数组的空间;只用到少数几个字节却导致整个文件的内容都一直保存在内存里。

To fix this problem one can copy the interesting data to a new slice before returning it:

要修复这个问题,在返回之前可以将感兴趣的数据复制到一个新的切片中:

1
2
3
4
5
6
7
8
func CopyDigits(filename string) []byte {
    b, _ := ioutil.ReadFile(filename) 
    b = digitRegexp.Find(b)

    c := make([]byte, len(b))
    copy(c, b)
    return c
}

使用 append 函数改进:

1
2
3
4
5
6
7
func CopyDigits(filename string) []byte {
    b, _ := ioutil.ReadFile(filename)
    b = digitRegexp.Find(b)

    c := make([]byte, len(b))
    return append(c, b...)
}

Further Reading

Effective Go contains an in-depth treatment(探讨) of slices and arrays, and the Go language specification defines slices and their associated helper functions.

Effective Go | Slices

Slices wrap arrays to give a more general, powerful, and convenient interface to sequences of data. Except for items with explicit dimension such as transformation matrices, most array programming in Go is done with slices rather than simple arrays.

切片通过对数组封装进行封装,提供了更加通用的,强大的和方便的接口来序列化数据。除了像矩阵变换这种有明确唯独需求的情况外,Go 中的大部分数组编程都是通过切片来完成的。

Slices hold references to an underlying array, and if you assign one slice to another, both refer to the same array. If a function takes a slice argument, changes it makes to the elements of the slice will be visible to the caller, analogous to passing a pointer to the underlying array(类似于传递了一个指向底层数组的指针). A Read function can therefore accept a slice argument rather than a pointer and a count; the length within the slice sets an upper limit of how much data to read. Here is the signature of the Read method of the File type in package os:

1
func (f *file) Read(buf []byte) (n int, err error)

The method returns the number of bytes read and an error value, if any(如果有的话). To read into the first 32 bytes of a larger buffer buf, slice (here used as a verb) the buffer.

该方法返回读取的字节数和一个错误值(若有的话)。若要从更大的缓冲区 buf 中读取前 32 个字节,只需要对其进行 切片 即可。

1
n, err := f.Read(buf[0:32])

Such slicing is common and efficient. In fact, leaving efficiency aside for the moment, the following snippet would also read the first 32 bytes of the buffer.

这种切片的方法常用且高效。若不谈效率,以下片段同样能读取该缓冲区的前32个字节。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
var n int 
var err error
for i := 0; i < 32; i++ {
    nbytes, e := f.Read(buf[i:i+1]) // 读取一个字节
    if nbytes == 0 || e != nil {
        err = e
        break
    }
    n += nbytes
}

The length of a slice may be changed as long as it still fits within the limits of the underlying array; just assign it to a slice of itself. The capacity of a slice, accessible by the built-in function cap, reports the maximum length the slice may assume. Here is a function to append data to a slice. If the data exceeds the capacity, the slice is reallocated. The resulting slice is returned. The function uses the fact that len and cap are legal when applied to the nil slice, and return 0.

在切片的长度没超过底层数组的限制,就可以改变它的长度,只需要将它赋予其自身的切片即可。切片的 容量 可以通过内建函数 cap 获得,它将返回这个切片能取到的最大长度。以下是将数据追加到切片的函数。若数据超出其容量,则该切片将会重新被分配。返回值即为所得的切片(新分配的)。该函数中所使用的 len 和 cpa 在 nil 切片上应用时是合法的,会返回 0.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
func Append(slice, data []byte) []byte {
    l := len(slice)
    if l + len(data) > cap(slice) { // 重新分配
        // 为了后面的增长,需分配两份。
        newSlice := make([]byte, (l + len(data)*2))
        copy(newSlice, slice)
        slice = newSlice
    }
    slice = slice[0:l+len(data)]
    for i, c := range data {
        slice[l+i] = c
    }
    return slice
}

We must return the slice afterwards because, although Append can modify the elements of slice, the slice itself (the run-time data structure holding the pointer, length, and capacity) is passed by value.

The idea of appending to a slice is so useful it’s captured by the append built-in function. To understand that function’s design, though, we need a little more information, so we’ll return to it later.

See Also

Thanks to the authors 🙂