Golang | Performance Optimization => pprof practice
Contents
- pprof 小练习
注册自定义的 pprof 接口(基于httprouter)
注册接口函数
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
// 使用自定义的 endpoint 提供 pprof 分析, 采用 httprouter 提供路由 func main() { router := httprouter.New() router.HandlerFunc("GET", "/debug/pprof/", pprof.Index) router.HandlerFunc("GET", "/debug/pprof/profile", pprof.Profile) router.HandlerFunc("GET", "/debug/pprof/trace", pprof.Trace) router.HandlerFunc("GET", "/debug/pprof/symbol", pprof.Symbol) router.HandlerFunc("GET", "/debug/pprof/cmdline", pprof.Cmdline) router.HandlerFunc("GET", "/debug/pprof/heap", pprof.Handler("heap" .ServeHTTP) router.HandlerFunc("GET", "/debug/pprof/goroutine", pprof.Handle ("goroutine").ServeHTTP) router.HandlerFunc("GET", "/debug/pprof/block", pprof.Handler("block" .ServeHTTP) router.HandlerFunc("GET", "/debug/pprof/threadcreate", pprof.Handle ("threadcreate").ServeHTTP) router.HandlerFunc("GET", "/debug/pprof/mutex", pprof.Handler("mutex" .ServeHTTP) log.Fatal(http.ListenAndServe(":8082", router)) }
- 运行服务
Note: 如果启用了 Caddy 做反向代理,则需开启 proxy /debug host:ip 转发
- 打开浏览器,进入 debug/pporf 的索引页面, 会有如下的输出:
1 2 3 4 5 6 7 8 9 10
/debug/pprof/ profiles: 0 block 447 goroutine 816 heap 0 mutex 5 threadcreate full goroutine stack dump
Note: 这个索引页面上所有可统计的数据,包括 go tool pprof 采集到的数据都依赖于进程中的 pprof 采样率, 默认 512KB 采样一次, 当需求的数据不够精确时,可以调节采样率:runtime.MemProfileRate, 但是采样率越低,进程云进行速度越慢。
分析 Memory
通过索引页,查看 heap 信息
https://host:ip/debug/pprof/heap?debug=1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46
heap profile: 4: 1136 [12390: 25948656] @ heap/1048576 2: 768 [2: 768] @ 0x433b71 0x43408d 0x455ce6 0x456739 0x42f050 # 0x433b70 runtime.malg+0x30 D:/Go/src/runtime/proc.go:3221 # 0x43408c runtime.newproc1+0x42c D:/Go/src/runtime/proc.go:3273 # 0x455ce5 runtime.newproc.func1+0x45 D:/Go/src/runtime/proc.go:3244 # 0x456738 runtime.systemstack+0x78 D:/Go/src/runtime/asm_amd64.s:409 # 0x42f050 runtime.mstart+0x0 D:/Go/src/runtime/proc.go:1175 1: 352 [137: 48224] @ 0x8a9f82 0x76d1f7 0x7726f8 0x6beb00 0x76310e 0x66b5e4 0x66e28c 0x66a601 0x4590a1 # 0x8a9f81 qx-api/src/resource.(*Robot).Create+0x41 D:/code/Go_Path/src/qx-api/src/resource/robot.go:59 # 0x76d1f6 qx-api/src/api.(*resource).handleCreate+0x186 D:/code/Go_Path/src/qx-api/src/api/resource.go:121 # 0x7726f7 qx-api/src/api.(*API).addResource.func2+0x407 D:/code/Go_Path/src/qx-api/src/api/api.go:96 # 0x6beaff github.com/julienschmidt/httprouter.(*Router).ServeHTTP+0x76f D:/code/Go_Path/src/github.com/julienschmidt/httprouter/router.go:344 # 0x76310d github.com/rs/cors.(*Cors).Handler.func1+0xfd D:/code/Go_Path/src/github.com/rs/cors/cors.go:190 # 0x66b5e3 net/http.HandlerFunc.ServeHTTP+0x43 D:/Go/src/net/http/server.go:1947 # 0x66e28b net/http.serverHandler.ServeHTTP+0xbb D:/Go/src/net/http/server.go:2694 # 0x66a600 net/http.(*conn).serve+0x650 D:/Go/src/net/http/server.go:1830 # runtime.MemStats # Alloc = 4693984 # TotalAlloc = 6570567768 # Sys = 15476984 # Lookups = 6441 # Mallocs = 106981880 # Frees = 106936207 # HeapAlloc = 4693984 # HeapSys = 10321920 # HeapIdle = 4726784 # HeapInuse = 5595136 # HeapReleased = 2342912 # HeapObjects = 45673 # Stack = 2260992 / 2260992 # MSpan = 95608 / 147456 # MCache = 1736 / 16384 # BuckHashSys = 1700763 # GCSys = 610304 # OtherSys = 419165 # NextGC = 5206176 # LastGC = 1529050738440737221 # PauseNs = [24391 22047 27986 23706 33849 31450 29197 26735 40822 26734 25389 24701 32894 38224 22443 23303 31302 66411 73710 .......] # PauseEnd = [1529050573425521907 1529050575212897649 1529050576923992459 1529050579026633238 1529050580885216128 1529050582768778420 .......] # NumGC = 2905 # NumForcedGC = 0 # GCCPUFraction = 0.0006629790327640667 # DebugGC = false
Note: 其中显示的内容会比较多,但是主体分为2个部分: - 第一部分:打印的是通过 runtime.MemProfile() 获取的 runtime.MemProfileRecord 记录。其含义为:
1 2 3 heap profile: 4(inused_objects): 1136(inused_bytes) [12390((alloc objects)): 25948656(alloc bytes)] @ heap/1048576((2*MemProfileRate)) 2: 768 [2: 768] @ 0x433b71 0x43408d 0x455ce6 0x456739 0x42f050(0xxxx表示: 记录中的栈指针) # 0x433b70 runtime.malg+0x30 D:/Go/src/runtime/proc.go:3221 (栈信息)
- 第二部分:打印的是通过 runtime.ReadMemStats() 读取的 runtime.MemStats 信息
- Sys: 进程从系统获得的内存空间,虚拟地址空间(单位:bytes)
- HeapAlloc: 进程对内存分配使用的内存空间, 通常是用户 new 出来的堆对象, 包含未被 gc 掉的
- HeapSys: 进程从系统获得的堆内存,因为 golang 底层使用 TCmalloc 机制,会缓存一部分堆内存,虚拟地址空间
- PauseNs: 记录每次 gc 暂停的时间(单位:ns),最多记录 256 个最新记录
- NumGC: 记录GC发生的次数
通过 web 可视化视图查看内存分配信息
执行命令:默认采集 inuse_space, 加 -alloc_space 采集程序整个生命周期内的内存分配
1 2
$ go tool pprof https://host:ip:1013/debug/pprof/heap $ go tool pprof -alloc_space https://host:ip/debug/pprof/heap
进入交互式命令行: 输入
web
命令,进入可视化界面- inuse_space
- alloc_space
总结:
不管是使用
top
命令查看输出的内存分配率,还是使用web
命令查看可视化的分配图,都可以看出在string-[]byte相互转换
、分配map: reflect.mapassign
、创建切片:bytes.makeSlice
、json编解码:encoding/json.Unmarshal
等函数调用上,累计分配的内存较多。
- 针对该问题就可以进行代码 review, 设计如何减少这些函数的调用次数或者优化相关代码逻辑
分析 CPU
执行命令:
1
go tool pprof https://open.robot-qixing.com:1013/debug/pprof/profile
waitting 30s …
进入交互式命令行: 输入
web
命令,进入可视化界面1 2 3 4 5 6 7 8 9 10 11 12 13 14
[zhe@zhe ~]$ go tool pprof https://open.robot-qixing.com:1013/debug/pprof/profile Fetching profile over HTTP from https://open.robot-qixing.com:1013/debug/pprof/profile Local symbolization failed for app: open /home/api/app: The system cannot find the path specified. Some binary filenames not available. Symbolization may be incomplete. Try setting PPROF_BINARY_PATH to the search path for local binaries. Saved profile in C:\Users\zhe\pprof\pprof.app.samples.cpu.002.pb.gz File: app Type: cpu Time: Jun 15, 2018 at 3:33pm (CST) Duration: 30s, Total samples = 1.20s ( 4.00%) Entering interactive mode (type "help" for commands, "o" for options) (pprof) (pprof) (pprof) web
See Also
Thanks to the authors 🙂