• pprof 小练习

实践练习项目:Golang profiling and optimizing

注册自定义的 pprof 接口(基于httprouter)

  1. 注册接口函数

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    
    // 使用自定义的 endpoint 提供 pprof 分析, 采用 httprouter 提供路由
    func main() {
        router := httprouter.New()
        router.HandlerFunc("GET", "/debug/pprof/", pprof.Index)
        router.HandlerFunc("GET", "/debug/pprof/profile", pprof.Profile)
        router.HandlerFunc("GET", "/debug/pprof/trace", pprof.Trace)
        router.HandlerFunc("GET", "/debug/pprof/symbol", pprof.Symbol)
        router.HandlerFunc("GET", "/debug/pprof/cmdline", pprof.Cmdline)    
        router.HandlerFunc("GET", "/debug/pprof/heap", pprof.Handler("heap" .ServeHTTP)
        router.HandlerFunc("GET", "/debug/pprof/goroutine", pprof.Handle    ("goroutine").ServeHTTP)
        router.HandlerFunc("GET", "/debug/pprof/block", pprof.Handler("block"   .ServeHTTP)
        router.HandlerFunc("GET", "/debug/pprof/threadcreate", pprof.Handle ("threadcreate").ServeHTTP)
        router.HandlerFunc("GET", "/debug/pprof/mutex", pprof.Handler("mutex"   .ServeHTTP)    
        log.Fatal(http.ListenAndServe(":8082", router))
    }
    1. 运行服务

    Note: 如果启用了 Caddy 做反向代理,则需开启 proxy /debug host:ip 转发

    1. 打开浏览器,进入 debug/pporf 的索引页面, 会有如下的输出:

    https://host:ip/debug/pprof

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    
    /debug/pprof/
    
    profiles:
    0   block
    447 goroutine
    816 heap
    0   mutex
    5   threadcreate
    
    full goroutine stack dump

    Note: 这个索引页面上所有可统计的数据,包括 go tool pprof 采集到的数据都依赖于进程中的 pprof 采样率, 默认 512KB 采样一次, 当需求的数据不够精确时,可以调节采样率:runtime.MemProfileRate, 但是采样率越低,进程云进行速度越慢。

    分析 Memory

    通过索引页,查看 heap 信息

    https://host:ip/debug/pprof/heap?debug=1

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    
    heap profile: 4: 1136 [12390: 25948656] @ heap/1048576
    2: 768 [2: 768] @ 0x433b71 0x43408d 0x455ce6 0x456739 0x42f050
    #	0x433b70	runtime.malg+0x30		D:/Go/src/runtime/proc.go:3221
    #	0x43408c	runtime.newproc1+0x42c		D:/Go/src/runtime/proc.go:3273
    #	0x455ce5	runtime.newproc.func1+0x45	D:/Go/src/runtime/proc.go:3244
    #	0x456738	runtime.systemstack+0x78	D:/Go/src/runtime/asm_amd64.s:409
    #	0x42f050	runtime.mstart+0x0		D:/Go/src/runtime/proc.go:1175
    
    1: 352 [137: 48224] @ 0x8a9f82 0x76d1f7 0x7726f8 0x6beb00 0x76310e 0x66b5e4 0x66e28c 0x66a601 0x4590a1
    #	0x8a9f81	qx-api/src/resource.(*Robot).Create+0x41			D:/code/Go_Path/src/qx-api/src/resource/robot.go:59
    #	0x76d1f6	qx-api/src/api.(*resource).handleCreate+0x186			D:/code/Go_Path/src/qx-api/src/api/resource.go:121
    #	0x7726f7	qx-api/src/api.(*API).addResource.func2+0x407			D:/code/Go_Path/src/qx-api/src/api/api.go:96
    #	0x6beaff	github.com/julienschmidt/httprouter.(*Router).ServeHTTP+0x76f	D:/code/Go_Path/src/github.com/julienschmidt/httprouter/router.go:344
    #	0x76310d	github.com/rs/cors.(*Cors).Handler.func1+0xfd			D:/code/Go_Path/src/github.com/rs/cors/cors.go:190
    #	0x66b5e3	net/http.HandlerFunc.ServeHTTP+0x43				D:/Go/src/net/http/server.go:1947
    #	0x66e28b	net/http.serverHandler.ServeHTTP+0xbb				D:/Go/src/net/http/server.go:2694
    #	0x66a600	net/http.(*conn).serve+0x650					D:/Go/src/net/http/server.go:1830
    
    
    # runtime.MemStats
    # Alloc = 4693984
    # TotalAlloc = 6570567768
    # Sys = 15476984
    # Lookups = 6441
    # Mallocs = 106981880
    # Frees = 106936207
    # HeapAlloc = 4693984
    # HeapSys = 10321920
    # HeapIdle = 4726784
    # HeapInuse = 5595136
    # HeapReleased = 2342912
    # HeapObjects = 45673
    # Stack = 2260992 / 2260992
    # MSpan = 95608 / 147456
    # MCache = 1736 / 16384
    # BuckHashSys = 1700763
    # GCSys = 610304
    # OtherSys = 419165
    # NextGC = 5206176
    # LastGC = 1529050738440737221
    # PauseNs = [24391 22047 27986 23706 33849 31450 29197 26735 40822 26734 25389 24701 32894 38224 22443 23303 31302 66411 73710 .......]
    # PauseEnd = [1529050573425521907 1529050575212897649 1529050576923992459 1529050579026633238 1529050580885216128 1529050582768778420 .......]
    # NumGC = 2905
    # NumForcedGC = 0
    # GCCPUFraction = 0.0006629790327640667
    # DebugGC = false

Note: 其中显示的内容会比较多,但是主体分为2个部分: - 第一部分:打印的是通过 runtime.MemProfile() 获取的 runtime.MemProfileRecord 记录。其含义为:

1
2
3
   heap profile: 4(inused_objects): 1136(inused_bytes) [12390((alloc objects)): 25948656(alloc bytes)] @ heap/1048576((2*MemProfileRate))
   2: 768 [2: 768] @ 0x433b71 0x43408d 0x455ce6 0x456739 0x42f050(0xxxx表示: 记录中的栈指针)
   #  0x433b70    runtime.malg+0x30       D:/Go/src/runtime/proc.go:3221 (栈信息)
  • 第二部分:打印的是通过 runtime.ReadMemStats() 读取的 runtime.MemStats 信息
    • Sys: 进程从系统获得的内存空间,虚拟地址空间(单位:bytes)
    • HeapAlloc: 进程对内存分配使用的内存空间, 通常是用户 new 出来的堆对象, 包含未被 gc 掉的
    • HeapSys: 进程从系统获得的堆内存,因为 golang 底层使用 TCmalloc 机制,会缓存一部分堆内存,虚拟地址空间
    • PauseNs: 记录每次 gc 暂停的时间(单位:ns),最多记录 256 个最新记录
    • NumGC: 记录GC发生的次数

通过 web 可视化视图查看内存分配信息

  1. 执行命令:默认采集 inuse_space, 加 -alloc_space 采集程序整个生命周期内的内存分配

    1
    2
    
    $ go tool pprof https://host:ip:1013/debug/pprof/heap 
    $ go tool pprof -alloc_space https://host:ip/debug/pprof/heap 
  2. 进入交互式命令行: 输入 web 命令,进入可视化界面

    • inuse_space

    • alloc_space

总结:

  • 不管是使用 top 命令查看输出的内存分配率,还是使用 web 命令查看可视化的分配图,都可以看出在 string-[]byte相互转换分配map: reflect.mapassign创建切片:bytes.makeSlicejson编解码:encoding/json.Unmarshal 等函数调用上,累计分配的内存较多。

    • 针对该问题就可以进行代码 review, 设计如何减少这些函数的调用次数或者优化相关代码逻辑

分析 CPU

  1. 执行命令:

    1
    
    go tool pprof https://open.robot-qixing.com:1013/debug/pprof/profile

    waitting 30s …

  2. 进入交互式命令行: 输入 web 命令,进入可视化界面

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    
    [zhe@zhe ~]$ go tool pprof https://open.robot-qixing.com:1013/debug/pprof/profile
    Fetching profile over HTTP from https://open.robot-qixing.com:1013/debug/pprof/profile
    Local symbolization failed for app: open /home/api/app: The system cannot find the path specified.
    Some binary filenames not available. Symbolization may be incomplete.
    Try setting PPROF_BINARY_PATH to the search path for local binaries.
    Saved profile in C:\Users\zhe\pprof\pprof.app.samples.cpu.002.pb.gz
    File: app
    Type: cpu
    Time: Jun 15, 2018 at 3:33pm (CST)
    Duration: 30s, Total samples = 1.20s ( 4.00%)
    Entering interactive mode (type "help" for commands, "o" for options)
    (pprof)
    (pprof)
    (pprof) web

    See Also

    Thanks to the authors 🙂