最近在开发一个C++项目的时候遇到了一个问题,进程跑了一段时间后突然退出了,并且报了一个段错误,记录下用gdb调试并定位问题的过程。1
[1] 21067 segmentation fault ./bin/test_process
gdb调试
先看了下运行目录下没有生成core文件,一般机器默认不生成core文件(很多Linux发行版在默认时禁止生成核心文件),执行以下命令重新运行后生成了core文件1
ulimit -c unlimited
运行以下命令进行gdb调试1
2
3
4
5
6 gdb+进程+core文件
gdb ./bin/test_process ./core.7455
或者gdb+进程,再执行run
gdb ./bin/test_process
run
出现了以下报错信息:1
2
3
4
5
6Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffec133700 (LWP 8282)]
0x00000000008e72ae in mbdstrgy::push::GroupProcessor::compute_score(std::shared_ptr<mbdstrgy::push::OperatorPack> const&, std::shared_ptr<mbdstrgy::push::TaskScore>&) (this=Unhandled dwarf expression opcode 0xf3
) at test-group/test-process/src/group.cpp:266
266 test-group/test-process/src/group.cpp: No such file or directory.
in test-group/test-process/src/group.cpp
执行backtrace(或者bt)显示堆栈信息:1
2
3
4
5
6
7(gdb) backtrace
#0 0x00000000008e72ae in mbdstrgy::push::GroupProcessor::compute_score(std::shared_ptr<mbdstrgy::push::OperatorPack> const&, std::shared_ptr<mbdstrgy::push::TaskScore>&) (this=Unhandled dwarf expression opcode 0xf3
) at test-group/test-process/src/group.cpp:266
#1 0x00000000008e7d1a in mbdstrgy::push::GroupProcessor::process() (this=0x33bde10) at test-group/test-process/src/group.cpp:334
#2 0x00007ffff7df316f in ?? () from /opt/compiler/gcc-8.2/lib/libstdc++.so.6
#3 0x00007ffff7fbeda4 in start_thread () at pthread_create.c:333
#4 0x00007ffff7b5432d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
执行frame查看指定的帧(以#开头的行),由于源文件没有对应上,这里没有显示出具体的代码1
2
3
4
5(gdb) frame 0
#0 mbdstrgy::push::GroupProcessor::compute_score(std::shared_ptr<mbdstrgy::push::OperatorPack> const&, std::shared_ptr<mbdstrgy::push::TaskScore>&) (this=Unhandled dwarf expression opcode 0xf3
) at test-group/test-process/src/group.cpp:265
265 test-group/test-process/src/group.cpp: No such file or directory.
in test-group/test-process/src/group.cpp
执行print+变量名可以查看当前context下某个变量的值,这里打印lr_strategy_ptr,发现是 0x0,是个空指针1
2
3
4(gdb) print lr_strategy_ptr
$1 = std::shared_ptr<mbdstrgy::push::Strategy> (empty) = {
get() = 0x0
}
执行backtrace full完全显示函数之间相互调用时传递的参数值和函数的内部变量值1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31(gdb) backtrace full
#0 0x00000000008e72ae in mbdstrgy::push::GroupProcessor::compute_score(std::shared_ptr<mbdstrgy::push::OperatorPack> const&, std::shared_ptr<mbdstrgy::push::TaskScore>&) (this=Unhandled dwarf expression opcode 0xf3
) at test-group/test-process/src/group.cpp:266
lr_strategy_ptr = std::shared_ptr<mbdstrgy::push::Strategy> (empty) = {
get() = 0x0
}
adjust_ptr = std::shared_ptr<mbdstrgy::push::Rerank> (empty) = {
get() = 0x0
}
unique_key = <value optimized out>
nid = <value optimized out>
taskid = <value optimized out>
score = 8.1749536922780693e-315
iter = <value optimized out>
#1 0x00000000008e7d1a in mbdstrgy::push::GroupProcessor::process() (this=0x33bde10) at test-group/test-process/src/group.cpp:334
operator_pack = std::shared_ptr<mbdstrgy::push::OperatorPack> (use count 1, weak count 0) = {
get() = 0x56d6ea50
}
logeveryn_329 = 22891
logeveryn_sc_329 = 100
logeveryn_c_329 = <value optimized out>
new_task_score = std::shared_ptr<mbdstrgy::push::TaskScore> (use count 1, weak count 0) = {
get() = 0x541d57b0
}
count = 157
#2 0x00007ffff7df316f in ?? () from /opt/compiler/gcc-8.2/lib/libstdc++.so.6
No symbol table info available.
#3 0x00007ffff7fbeda4 in start_thread () at pthread_create.c:333
No symbol table info available.
#4 0x00007ffff7b5432d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
No locals.
可以看到frame 0里有两个空指针,对空指针进行操作导致了segmentation fault。结合日志,最后定位到原因是有个配置有问题引起的,至此问题得到解决。
这个问题其实在单测中可以发现,因此先单测通过再自测是个比较好的习惯。