用gdb调试C++进程段错误(segmentation fault)

最近在开发一个C++项目的时候遇到了一个问题,进程跑了一段时间后突然退出了,并且报了一个段错误,记录下用gdb调试并定位问题的过程。

1
[1]    21067 segmentation fault  ./bin/test_process

gdb调试

先看了下运行目录下没有生成core文件,一般机器默认不生成core文件(很多Linux发行版在默认时禁止生成核心文件),执行以下命令重新运行后生成了core文件

1
ulimit -c unlimited

运行以下命令进行gdb调试

1
2
3
4
5
6
# gdb+进程+core文件
gdb ./bin/test_process ./core.7455

# 或者gdb+进程,再执行run
gdb ./bin/test_process
run

出现了以下报错信息:

1
2
3
4
5
6
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffec133700 (LWP 8282)]
0x00000000008e72ae in mbdstrgy::push::GroupProcessor::compute_score(std::shared_ptr<mbdstrgy::push::OperatorPack> const&, std::shared_ptr<mbdstrgy::push::TaskScore>&) (this=Unhandled dwarf expression opcode 0xf3
) at test-group/test-process/src/group.cpp:266
266 test-group/test-process/src/group.cpp: No such file or directory.
in test-group/test-process/src/group.cpp

执行backtrace(或者bt)显示堆栈信息:

1
2
3
4
5
6
7
(gdb) backtrace
#0 0x00000000008e72ae in mbdstrgy::push::GroupProcessor::compute_score(std::shared_ptr<mbdstrgy::push::OperatorPack> const&, std::shared_ptr<mbdstrgy::push::TaskScore>&) (this=Unhandled dwarf expression opcode 0xf3
) at test-group/test-process/src/group.cpp:266
#1 0x00000000008e7d1a in mbdstrgy::push::GroupProcessor::process() (this=0x33bde10) at test-group/test-process/src/group.cpp:334
#2 0x00007ffff7df316f in ?? () from /opt/compiler/gcc-8.2/lib/libstdc++.so.6
#3 0x00007ffff7fbeda4 in start_thread () at pthread_create.c:333
#4 0x00007ffff7b5432d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

执行frame查看指定的帧(以#开头的行),由于源文件没有对应上,这里没有显示出具体的代码

1
2
3
4
5
(gdb) frame 0
#0 mbdstrgy::push::GroupProcessor::compute_score(std::shared_ptr<mbdstrgy::push::OperatorPack> const&, std::shared_ptr<mbdstrgy::push::TaskScore>&) (this=Unhandled dwarf expression opcode 0xf3
) at test-group/test-process/src/group.cpp:265
265 test-group/test-process/src/group.cpp: No such file or directory.
in test-group/test-process/src/group.cpp

执行print+变量名可以查看当前context下某个变量的值,这里打印lr_strategy_ptr,发现是 0x0,是个空指针

1
2
3
4
(gdb) print lr_strategy_ptr
$1 = std::shared_ptr<mbdstrgy::push::Strategy> (empty) = {
get() = 0x0
}

执行backtrace full完全显示函数之间相互调用时传递的参数值和函数的内部变量值

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
(gdb) backtrace full
#0 0x00000000008e72ae in mbdstrgy::push::GroupProcessor::compute_score(std::shared_ptr<mbdstrgy::push::OperatorPack> const&, std::shared_ptr<mbdstrgy::push::TaskScore>&) (this=Unhandled dwarf expression opcode 0xf3
) at test-group/test-process/src/group.cpp:266
lr_strategy_ptr = std::shared_ptr<mbdstrgy::push::Strategy> (empty) = {
get() = 0x0
}
adjust_ptr = std::shared_ptr<mbdstrgy::push::Rerank> (empty) = {
get() = 0x0
}
unique_key = <value optimized out>
nid = <value optimized out>
taskid = <value optimized out>
score = 8.1749536922780693e-315
iter = <value optimized out>
#1 0x00000000008e7d1a in mbdstrgy::push::GroupProcessor::process() (this=0x33bde10) at test-group/test-process/src/group.cpp:334
operator_pack = std::shared_ptr<mbdstrgy::push::OperatorPack> (use count 1, weak count 0) = {
get() = 0x56d6ea50
}
logeveryn_329 = 22891
logeveryn_sc_329 = 100
logeveryn_c_329 = <value optimized out>
new_task_score = std::shared_ptr<mbdstrgy::push::TaskScore> (use count 1, weak count 0) = {
get() = 0x541d57b0
}
count = 157
#2 0x00007ffff7df316f in ?? () from /opt/compiler/gcc-8.2/lib/libstdc++.so.6
No symbol table info available.
#3 0x00007ffff7fbeda4 in start_thread () at pthread_create.c:333
No symbol table info available.
#4 0x00007ffff7b5432d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
No locals.

可以看到frame 0里有两个空指针,对空指针进行操作导致了segmentation fault。结合日志,最后定位到原因是有个配置有问题引起的,至此问题得到解决。
这个问题其实在单测中可以发现,因此先单测通过再自测是个比较好的习惯。

参考

用gdb调试程序笔记: 以段错误(Segmental fault)为例