网站首页 > 教程文章 正文
背景
- go进程内无集成pprof等debug工具包
- go在生产环境运行存在死锁
- 可以kill进程,但是前提时kill的话保障能找到问题根源,否则现场丢失无法再次debug
关于此问题
通常可以通过更改程序代码来进行调试。
这可以称为检测:添加调试检测以帮助了解错误,然后再次运行有问题的操作。
检测可以是“打印语句”,也可以是更优雅的方式,例如添加调试器断点,甚至可以不加改动地构建您的代码,但 要求编译器添加调试符号。
但有时您遇到的问题可能很少发生,以至于您无法重建(并因此重新运行)二进制文件,而只能调试正在运行的进程。这篇文章是关于这种情况的,使用 Go。
选项 1:将调试器附加到正在运行的程序
您可以使用调试器(例如Delve)附加到现有进程。无需重新编译或添加检测。
假设我们进程的 PID 是4040133:
$ sudo ./dlv attach 4040133
Type 'help' for list of commands.
(dlv) goroutines
... goroutines' state is dumped here ...
那很简单!Delve 当然更强大:您可以设置断点、观察变量、逐步执行代码等。
选项 2:当您可以看到进程的 stderr 时,使用堆栈跟踪退出
Go 提供了这个开箱即用的好功能:当您向它发送 SIGQUIT 信号时,它会以堆栈转储退出。显示所有 goroutine 的堆栈转储,因此您可以知道每个“线程”在接收时正在做什么SIGQUIT。
所以在实践中,这个堆栈跟踪对你来说真的很有价值。现在让我们学习挖掘它。
4040133您可以通过运行(仍然假设我们的 PID 是)将转储写入进程的 stderr :
$ kill -QUIT 4040133
在您运行程序的另一个终端(或者它写入 stderr 的地方,也许 $ journalctl如果您的应用程序在 systemd 下运行)您会看到:
SIGQUIT: quit
PC=0x464ce1 m=0 sigcode=0
goroutine 0 [idle]:
runtime.futex()
/usr/local/go/src/runtime/sys_linux_amd64.s:552 +0x21
runtime.futexsleep(0x7fff3a356560?, 0x441df3?, 0xc000032000?)
/usr/local/go/src/runtime/os_linux.go:56 +0x36
runtime.notesleep(0xbf91d0)
/usr/local/go/src/runtime/lock_futex.go:159 +0x87
runtime.mPark()
/usr/local/go/src/runtime/proc.go:1447 +0x2a
runtime.stoplockedm()
/usr/local/go/src/runtime/proc.go:2611 +0x65
runtime.schedule()
/usr/local/go/src/runtime/proc.go:3308 +0x3d
runtime.park_m(0xc0001251e0?)
/usr/local/go/src/runtime/proc.go:3525 +0x14d
runtime.mcall()
/usr/local/go/src/runtime/asm_amd64.s:425 +0x43
goroutine 1 [chan receive, 21508 minutes]:
github.com/function61/gokit/sync/taskrunner.(*Runner).Wait(...)
/go/pkg/mod/github.com/function61/gokit@v0.0.0-20211228101508-315ec8b830c9/sync/taskrunner/taskrunner.go:79
github.com/joonas-fi/joonas-sys/pkg/statusbar.logic({0x96f6f8, 0xc000030cc0})
/workspace/pkg/statusbar/bar.go:150 +0x1bf
github.com/joonas-fi/joonas-sys/pkg/statusbar.Entrypoint.func1(0xc0001bc780?, {0xc28ab8?, 0x0?, 0x0?})
/workspace/pkg/statusbar/bar.go:34 +0x25
github.com/spf13/cobra.(*Command).execute(0xc0001bc780, {0xc28ab8, 0x0, 0x0})
/go/pkg/mod/github.com/spf13/cobra@v1.2.1/command.go:860 +0x663
github.com/spf13/cobra.(*Command).ExecuteC(0xc000187b80)
/go/pkg/mod/github.com/spf13/cobra@v1.2.1/command.go:974 +0x3b4
github.com/spf13/cobra.(*Command).Execute(...)
/go/pkg/mod/github.com/spf13/cobra@v1.2.1/command.go:902
main.main()
/workspace/cmd/jsys/main.go:42 +0x434
goroutine 17 [syscall, 21508 minutes]:
os/signal.signal_recv()
/usr/local/go/src/runtime/sigqueue.go:168 +0x98
os/signal.loop()
/usr/local/go/src/os/signal/signal_unix.go:23 +0x19
created by os/signal.Notify.func1.1
/usr/local/go/src/os/signal/signal.go:151 +0x2a
goroutine 18 [chan receive, 21508 minutes]:
github.com/function61/gokit/os/osutil.CancelOnInterruptOrTerminate.func1()
/go/pkg/mod/github.com/function61/gokit@v0.0.0-20211228101508-315ec8b830c9/os/osutil/canceloninterruptorterminate.go:32 +0x4d
created by github.com/function61/gokit/os/osutil.CancelOnInterruptOrTerminate
/go/pkg/mod/github.com/function61/gokit@v0.0.0-20211228101508-315ec8b830c9/os/osutil/canceloninterruptorterminate.go:31 +0x10a
goroutine 19 [syscall, 4080 minutes]:
syscall.Syscall(0x0, 0x0, 0xc0000ea3e4, 0xc1c)
/usr/local/go/src/syscall/asm_linux_amd64.s:20 +0x5
syscall.read(0xc000072060?, {0xc0000ea3e4?, 0x9?, 0xc0002e2ea0?})
/usr/local/go/src/syscall/zsyscall_linux_amd64.go:696 +0x4d
syscall.Read(...)
/usr/local/go/src/syscall/syscall_unix.go:188
internal/poll.ignoringEINTRIO(...)
/usr/local/go/src/internal/poll/fd_unix.go:794
internal/poll.(*FD).Read(0xc000072060?, {0xc0000ea3e4?, 0xc1c?, 0xc1c?})
/usr/local/go/src/internal/poll/fd_unix.go:163 +0x285
os.(*File).read(...)
/usr/local/go/src/os/file_posix.go:31
os.(*File).Read(0xc00000e010, {0xc0000ea3e4?, 0x1?, 0x120?})
/usr/local/go/src/os/file.go:119 +0x5e
bufio.(*Scanner).Scan(0xc0000e3ef8)
/usr/local/go/src/bufio/scan.go:215 +0x865
github.com/joonas-fi/joonas-sys/pkg/statusbar.logic.func1({0x0?, 0x0?})
/workspace/pkg/statusbar/bar.go:61 +0x89
github.com/function61/gokit/sync/taskrunner.(*Runner).Start.func1()
/go/pkg/mod/github.com/function61/gokit@v0.0.0-20211228101508-315ec8b830c9/sync/taskrunner/taskrunner.go:51 +0x45
created by github.com/function61/gokit/sync/taskrunner.(*Runner).Start
/go/pkg/mod/github.com/function61/gokit@v0.0.0-20211228101508-315ec8b830c9/sync/taskrunner/taskrunner.go:50 +0x105
goroutine 23 [chan receive, 1390 minutes]:
github.com/function61/gokit/sync/taskrunner.(*Runner).waitInternal.func2(...)
/go/pkg/mod/github.com/function61/gokit@v0.0.0-20211228101508-315ec8b830c9/sync/taskrunner/taskrunner.go:101
github.com/function61/gokit/sync/taskrunner.(*Runner).waitInternal(0xc0000b2140)
/go/pkg/mod/github.com/function61/gokit@v0.0.0-20211228101508-315ec8b830c9/sync/taskrunner/taskrunner.go:134 +0x30a
github.com/function61/gokit/sync/taskrunner.(*Runner).Done.func1.1()
/go/pkg/mod/github.com/function61/gokit@v0.0.0-20211228101508-315ec8b830c9/sync/taskrunner/taskrunner.go:63 +0x25
created by github.com/function61/gokit/sync/taskrunner.(*Runner).Done.func1
/go/pkg/mod/github.com/function61/gokit@v0.0.0-20211228101508-315ec8b830c9/sync/taskrunner/taskrunner.go:62 +0x5a
goroutine 34 [chan send, 1390 minutes]:
github.com/vishvananda/netlink.routeSubscribeAt.func2()
/go/pkg/mod/github.com/vishvananda/netlink@v1.1.0/route_linux.go:1075 +0x453
created by github.com/vishvananda/netlink.routeSubscribeAt
/go/pkg/mod/github.com/vishvananda/netlink@v1.1.0/route_linux.go:1037 +0x2f2
rax 0xca
rbx 0x0
rcx 0x464ce3
rdx 0x0
rdi 0xbf91d0
rsi 0x80
rbp 0x7fff3a356530
rsp 0x7fff3a3564e8
r8 0x0
r9 0x0
r10 0x0
r11 0x286
r12 0x43c400
r13 0x0
r14 0xbf8940
r15 0x7fb0d47ba96c
rip 0x464ce1
rflags 0x286
cs 0x33
fs 0x0
gs 0x0
关于转储的一件被低估的事情是它显示了系统调用等待事件的时间!我知道我的过程的问题大约在 23 小时 15 分钟前开始,并且1390 minutes 与此几乎完全一致!
通过上面的堆栈转储,我能够找出错误所在。
选项 3:当您看不到进程的 stderr 时,使用堆栈跟踪退出
如果您不确定该过程的stderr去向,我建议您先看看是否有一个简单的解决方案。假设您的进程 ID 是4040133. 查找文件描述符 #2(它始终是 stderr)以了解其stderr连接位置:
$ ls -al /proc/4040133/fd/2
l-wx------ 1 joonas joonas 64 Feb 20 19:37 /proc/4040133/fd/2 -> /home/joonas/.xsession-errors
在我的例子中,我的程序在 X.org 服务器下运行,并且stderr简单地写入了我的 .xsession-errors文件。如果我早点意识到这一点,我就可以省去麻烦了。
由于当时我不确定stderr要写入何处,所以我选择了核选项。
(即使在您认为stderr没有价值并将其重定向到/dev/null!!! 的情况下,这也有效!)
诀窍是针对$ strace您现有的流程并捕获write(2, ...)系统调用。syscall的第一个参数write()是文件描述符编号,2再次表示stderr。
所以附加到这个过程strace:
$ sudo strace -p 2143770 -s 512 -ewrite 2> /tmp/strace.log
然后在另一个终端中,要求您的进程退出(这将触发 Go 运行时写入堆栈跟踪,这通常会通过写入代理结束丢弃/dev/null):
$ kill -QUIT 2143770
该进程现在将堆栈跟踪转储到/dev/null,但它必须通过发出系统调用来完成,系统调用 strace会为您记录。
当您查看日志文件时/tmp/strace.log,它看起来像:
strace: Process 4040133 attached
--- SIGQUIT {si_signo=SIGQUIT, si_code=SI_USER, si_pid=2140770, si_uid=1000} ---
write(2, "SIGQUIT: quit", 13) = 13
write(2, "\n", 1) = 1
write(2, "PC=", 3) = 3
write(2, "0x464ce1", 8) = 8
write(2, " m=", 3) = 3
write(2, "0", 1) = 1
write(2, " sigcode=", 9) = 9
write(2, "0", 1) = 1
write(2, "\n", 1) = 1
write(2, "\n", 1) = 1
write(2, "goroutine ", 10) = 10
write(2, "0", 1) = 1
write(2, " [", 2) = 2
write(2, "idle", 4) = 4
write(2, "]:\n", 3) = 3
write(2, "runtime.futex", 13) = 13
write(2, "(", 1) = 1
write(2, ")\n", 2) = 2
write(2, "\t", 1) = 1
write(2, "/usr/local/go/src/runtime/sys_linux_amd64.s", 43) = 43
write(2, ":", 1) = 1
write(2, "552", 3) = 3
write(2, " +", 2) = 2
write(2, "0x21", 4) = 4
write(2, "\n", 1) = 1
write(2, "runtime.futexsleep", 18) = 18
write(2, "(", 1) = 1
write(2, "0x7ffc12443b70", 14) = 14
write(2, "?", 1) = 1
write(2, ", ", 2) = 2
write(2, "0x441df3", 8) = 8
write(2, "?", 1) = 1
write(2, ", ", 2) = 2
write(2, "0xc000036500", 12) = 12
write(2, "?", 1) = 1
write(2, ")\n", 2) = 2
write(2, "\t", 1) = 1
write(2, "/usr/local/go/src/runtime/os_linux.go", 37) = 37
write(2, ":", 1) = 1
write(2, "56", 2) = 2
write(2, " +", 2) = 2
write(2, "0x36", 4) = 4
write(2, "\n", 1) = 1
write(2, "runtime.notesleep", 17) = 17
write(2, "(", 1) = 1
write(2, "0xbfd370", 8) = 8
write(2, ")\n", 2) = 2
write(2, "\t", 1) = 1
... output snipped ...
write(2, "0xbfcae0", 8) = 8
write(2, "\n", 1) = 1
write(2, "r15 ", 7) = 7
write(2, "0x7ff9ba37ce03", 14) = 14
write(2, "\n", 1) = 1
write(2, "rip ", 7) = 7
write(2, "0x464ce1", 8) = 8
write(2, "\n", 1) = 1
write(2, "rflags ", 7) = 7
write(2, "0x286", 5) = 5
write(2, "\n", 1) = 1
write(2, "cs ", 7) = 7
write(2, "0x33", 4) = 4
write(2, "\n", 1) = 1
write(2, "fs ", 7) = 7
write(2, "0x0", 3) = 3
write(2, "\n", 1) = 1
write(2, "gs ", 7) = 7
write(2, "0x0", 3) = 3
write(2, "\n", 1) = 1
+++ exited with 2 +++
这些是原始系统调用,因此您需要进行一些文本处理才能将其转换回人类可读的内容。
像这样的脚本可能会对您有所帮助。
但基本思想是这样的,让我们先看前几?行:
write(2, "SIGQUIT: quit", 13) = 13
write(2, "\n", 1) = 1
write(2, "PC=", 3) = 3
write(2, "0x464ce1", 8) = 8
write(2, " m=", 3) = 3
write(2, "0", 1) = 1
write(2, " sigcode=", 9) = 9
write(2, "0", 1) = 1
write(2, "\n", 1) = 1
write(2, "\n", 1) = 1
write(2, "goroutine ", 10) = 10
write(2, "0", 1) = 1
write(2, " [", 2) = 2
write(2, "idle", 4) = 4
write(2, "]:\n", 3) = 3
write(2, "runtime.futex", 13) = 13
write(2, "(", 1) = 1
write(2, ")\n", 2) = 2
write(2, "\t", 1) = 1
write(2, "/usr/local/go/src/runtime/sys_linux_amd64.s", 43) = 43
write(2, ":", 1) = 1
write(2, "552", 3) = 3
write(2, " +", 2) = 2
write(2, "0x21", 4) = 4
write(2, "\n", 1) = 1
只需获取原始字符串,您甚至可以+在浏览器的 JS 控制台中将它们评估为 JavaScript 运算符,例如重新组合它们:
"SIGQUIT: quit" +
"\n" +
"PC=" +
"0x464ce1" +
" m=" +
"0" +
" sigcode=" +
"0" +
"\n" +
"\n" +
"goroutine " +
"0" +
" [" +
"idle" +
"]:\n" +
"runtime.futex" +
"(" +
")\n" +
"\t" +
"/usr/local/go/src/runtime/sys_linux_amd64.s" +
":" +
"552" +
" +" +
"0x21" +
"\n";
->
SIGQUIT: quit\nPC=0x464ce1 m=0 sigcode=0\n\ngoroutine 0 [idle]:\nruntime.futex()\n\t/usr/local/go/src/runtime/sys_linux_amd64.s:552 +0x21\n
然后用\n换行符和\t制表符替换:
SIGQUIT: quit
PC=0x464ce1 m=0 sigcode=0
goroutine 0 [idle]:
runtime.futex()
/usr/local/go/src/runtime/sys_linux_amd64.s:552 +0x21
因此,即使在数据被发送到垃圾箱的情况下,我们也恢复了重要数据!
猜你喜欢
- 2025-01-05 OpenShift 平台企业版 OCP 4.11.9 部署(基于KVM,CentOS, CoreOS)
- 2025-01-05 春节消费靠Z世代?这10个问题我们准备好了
- 2025-01-05 我们在战位,向祖国母亲献礼!
- 2025-01-05 WLK怀旧服WA:猎人核心输出技能循环
- 2025-01-05 K8s里我的容器到底用了多少内存?
- 2025-01-05 AndroidStudio_Android使用OkHttp发起Http请求
- 2025-01-05 魔兽一秒学会惩戒骑:打地鼠WA
- 2025-01-05 魔兽世界WLK德鲁伊通用技能提示
- 2025-01-05 Windows常用的一些CMD运行命令
- 2025-01-05 服务部署 - DNS域名解析服务配置
- 最近发表
- 标签列表
-
- location.href (44)
- document.ready (36)
- git checkout -b (34)
- 跃点数 (35)
- 阿里云镜像地址 (33)
- qt qmessagebox (36)
- mybatis plus page (35)
- vue @scroll (38)
- 堆栈区别 (33)
- 什么是容器 (33)
- sha1 md5 (33)
- navicat导出数据 (34)
- 阿里云acp考试 (33)
- 阿里云 nacos (34)
- redhat官网下载镜像 (36)
- srs服务器 (33)
- pico开发者 (33)
- https的端口号 (34)
- vscode更改主题 (35)
- 阿里云资源池 (34)
- os.path.join (33)
- redis aof rdb 区别 (33)
- 302跳转 (33)
- http method (35)
- js array splice (33)