Analyzing crash dump or vmcore file in linux step by step.
Today we will see one of the most useful article how to analyze the vmcore file in linux.
Lets see step by step how to achieve this.
1. Check the kernel version of or system.
[root@ngelinux001 127.0.0.1-2022-06-25-08:06:37]# uname -a Linux ngelinux001 3.10.0-1160.42.2.el7.x86_64 #1 SMP Tue Sep 7 14:49:57 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
2. Check out if crash rpm installed or not.
[root@ngelinux001 127.0.0.1-2022-06-25-08:06:37]# rpm -qa | grep -i crash crash-7.2.3-11.el7_9.1.x86_64
In case not installed, just install it using yum.
3. Install kernel Debuginfo packages specific to the kernel version.
Since my yum repo does not have these packages, i have downloaded them in my current directory and then ran this command.
[root@ngelinux001 ~]# yum install ./kernel-debuginfo-common-x86_64-3.10.0-1160.42.2.el7.x86_64.rpm ./kernel-debuginfo-3.10.0-1160.42.2.el7.x86_64.rpm Loaded plugins: aliases, changelog, fastestmirror, kabi, langpacks, tmprepo, verify, versionlock Loading support for Red Hat kernel ABI Examining ./kernel-debuginfo-common-x86_64-3.10.0-1160.42.2.el7.x86_64.rpm: kernel-debuginfo-common-x86_64-3.10.0-1160.42.2.el7.x86_64 Marking ./kernel-debuginfo-common-x86_64-3.10.0-1160.42.2.el7.x86_64.rpm to be installed Examining ./kernel-debuginfo-3.10.0-1160.42.2.el7.x86_64.rpm: kernel-debuginfo-3.10.0-1160.42.2.el7.x86_64 Marking ./kernel-debuginfo-3.10.0-1160.42.2.el7.x86_64.rpm to be installed Resolving Dependencies --> Running transaction check ---> Package kernel-debuginfo.x86_64 0:3.10.0-1160.42.2.el7 will be installed ---> Package kernel-debuginfo-common-x86_64.x86_64 0:3.10.0-1160.42.2.el7 will be installed --> Finished Dependency Resolution Dependencies Resolved ================================================================================================================================ Package Arch Version Repository Size ================================================================================================================================ Installing: kernel-debuginfo x86_64 3.10.0-1160.42.2.el7 /kernel-debuginfo-3.10.0-1160.42.2.el7.x86_64 2.1 G kernel-debuginfo-common-x86_64 x86_64 3.10.0-1160.42.2.el7 /kernel-debuginfo-common-x86_64-3.10.0-1160.42.2.el7.x86_64 294 M Transaction Summary ================================================================================================================================ Install 2 Packages Total size: 2.4 G Installed size: 2.4 G Is this ok [y/d/N]: y Downloading packages: Running transaction check Running transaction test Transaction test succeeded Running transaction Installing : kernel-debuginfo-common-x86_64-3.10.0-1160.42.2.el7.x86_64 1/2 Installing : kernel-debuginfo-3.10.0-1160.42.2.el7.x86_64 [################################ ] 2/2
4. Now initiate crash command with collected vmcore inside crash dump and vmlinux file.
[root@ngelinux001 127.0.0.1-2022-06-25-08:06:37]# crash ./vmcore /usr/lib/debug/lib/modules/3.10.0-1160.42.2.el7.x86_64/vmlinux crash>
5. Backtrack the crash scenario to identify what initiated the crash dump.
crash> bt PID: 29187 TASK: ffff90fd72842100 CPU: 20 COMMAND: "php" #0 [ffff911e15db7a50] machine_kexec at ffffffffa12662c4 #1 [ffff911e15db7ab0] __crash_kexec at ffffffffa1322a32 #2 [ffff911e15db7b80] crash_kexec at ffffffffa1322b20 #3 [ffff911e15db7b98] oops_end at ffffffffa198d798 #4 [ffff911e15db7bc0] no_context at ffffffffa1275d14 #5 [ffff911e15db7c10] __bad_area_nosemaphore at ffffffffa1275fe2 #6 [ffff911e15db7c60] bad_area_nosemaphore at ffffffffa1276104 #7 [ffff911e15db7c70] __do_page_fault at ffffffffa1990750 #8 [ffff911e15db7ce0] do_page_fault at ffffffffa1990975 #9 [ffff911e15db7d10] page_fault at ffffffffa198c778 [exception RIP: unknown or invalid address] RIP: ffff911c3f6959e0 RSP: ffff911e15db7dc8 RFLAGS: 00010083 RAX: 0000000000000000 RBX: ffff911c3f6959e0 RCX: 0000000000000000 RDX: ffff911c3f6959a0 RSI: ffff911c3f6959f0 RDI: ffff911e15db7e80 RBP: ffffffffa12c9875 R8: 0000000000000000 R9: 0000000000000018 R10: 0000000062b72477 R11: 0000000000000246 R12: ffff911e15db7e80 R13: ffff911e15db7dd8 R14: ffff911c3f6959a0 R15: ffff911c3f6959a0 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #10 [ffff911e15db7de0] hrtimer_start_range_ns at ffffffffa12c9f3e #11 [ffff911e15db7e48] do_nanosleep at ffffffffa198834b #12 [ffff911e15db7e78] hrtimer_nanosleep at ffffffffa12cad0b #13 [ffff911e15db7f18] sys_nanosleep at ffffffffa12cae66 #14 [ffff911e15db7f50] system_call_fastpath at ffffffffa1995f92 RIP: 00007f272626f8d0 RSP: 00007ffcb26e5398 RFLAGS: 00000202 RAX: 0000000000000023 RBX: 000055e6b5830330 RCX: 0000000000000001 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00007ffcb26e53e0 RBP: 00007ffcb26e5480 R8: 0000000000000000 R9: 0000000000000018 R10: 0000000062b72477 R11: 0000000000000246 R12: 00007f2724e1ceb0 R13: 0000000000000000 R14: 000000000df03106 R15: 0000000000000000 ORIG_RAX: 0000000000000023 CS: 0033 SS: 002b crash>
Above we can see the php command initiated the crash dump here and its PID 29187.
6. Check out the PID details.
crash> ps | grep -i 29187 > 29187 1 20 ffff90fd72842100 IN 0.0 399168 45176 php 29650 29187 44 ffff910413cba100 IN 0.0 181088 4864 ssh
7. Check the memory details where it is used.
crash> kmem ffff90fd72842100 CACHE NAME OBJSIZE ALLOCATED TOTAL SLABS SSIZE ffff90fdbfdbe300 task_struct 4216 2056 2926 418 32k SLAB MEMORY NODE TOTAL ALLOCATED FREE fffff87684ca1000 ffff90fd72840000 0 7 7 0 FREE / [ALLOCATED] [ffff90fd72842100] PID: 29187 COMMAND: "php" TASK: ffff90fd72842100 [THREAD_INFO: ffff911e15db4000] CPU: 20 STATE: TASK_INTERRUPTIBLE (PANIC) PAGE PHYSICAL MAPPING INDEX CNT FLAGS fffff87684ca1080 132842000 0 0 0 2fffff00008000 tail crash>
8. Analysis Part
The system was crashed due to “php” command.
Below data is visible when we run crash command.
=====================
NODENAME: ngelinux001
RELEASE: 3.10.0-1160.42.2.el7.x86_64
PANIC: “BUG: unable to handle kernel paging request at ffff911c3f6959e0”
=====================
KERNEL: /usr/lib/debug/lib/modules/3.10.0-1160.42.2.el7.x86_64/vmlinux
DUMPFILE: ./vmcore [PARTIAL DUMP]
CPUS: 72
DATE: Sat Jun 25 08:06:31 2022
UPTIME: 22 days, 11:08:17
LOAD AVERAGE: 21.45, 23.17, 22.74
TASKS: 2052
NODENAME: ngelinux001
RELEASE: 3.10.0-1160.42.2.el7.x86_64
VERSION: #1 SMP Tue Sep 7 14:49:57 UTC 2021
MACHINE: x86_64 (2299 Mhz)
MEMORY: 255.9 GB
PANIC: “BUG: unable to handle kernel paging request at ffff911c3f6959e0”
PID: 29187
COMMAND: “php”
TASK: ffff90fd72842100 [THREAD_INFO: ffff911e15db4000]
CPU: 20
STATE: TASK_INTERRUPTIBLE (PANIC)
—No details for this memory reference—–
crash> kmem 0xffff911c3f6959e0
PAGE PHYSICAL MAPPING INDEX CNT FLAGS
fffff876fffda540 1fff695000 0 0 1 2fffff00000400 reserved
—No issues with kernel symbol referencing—–
crash> dis -rl 0xffff911c3f6959e0
dis: WARNING: ffff911c3f6959e0: no associated kernel symbol found
0xffff911c3f6959e0: movabs 0xffff911c3f6959,%al
crash>
crash> bt -v
No stack overflows detected
9. Crashed by php process which initiated the ssh session as child process.
> 29187 1 20 ffff90fd72842100 IN 0.0 399168 45176 php
29650 29187 44 ffff910413cba100 IN 0.0 181088 4864 ssh
PID PPID CPU TASK ST %MEM VSZ RSS COMM
> 29187 1 20 ffff90fd72842100 IN 0.0 399168 45176 php
10. The system gets crashed when this memory reference ffff910413cba100 i.e. ssh process initiated a tail command which referenced to an invalid memory location.
crash> kmem ffff90fd72842100
CACHE NAME OBJSIZE ALLOCATED TOTAL SLABS SSIZE
ffff90fdbfdbe300 task_struct 4216 2056 2926 418 32k
SLAB MEMORY NODE TOTAL ALLOCATED FREE
fffff87684ca1000 ffff90fd72840000 0 7 7 0
FREE / [ALLOCATED]
[ffff90fd72842100]
PID: 29187
COMMAND: “php”
TASK: ffff90fd72842100 [THREAD_INFO: ffff911e15db4000]
CPU: 20
STATE: TASK_INTERRUPTIBLE (PANIC)
PAGE PHYSICAL MAPPING INDEX CNT FLAGS
fffff87684ca1080 132842000 0 0 0 2fffff00008000 tail
crash>
11. Lets check out the backtrace of the process.
PID: 29187 TASK: ffff90fd72842100 CPU: 20 COMMAND: “php”
#0 [ffff911e15db7a50] machine_kexec at ffffffffa12662c4
#1 [ffff911e15db7ab0] __crash_kexec at ffffffffa1322a32
#2 [ffff911e15db7b80] crash_kexec at ffffffffa1322b20
#3 [ffff911e15db7b98] oops_end at ffffffffa198d798
#4 [ffff911e15db7bc0] no_context at ffffffffa1275d14
#5 [ffff911e15db7c10] __bad_area_nosemaphore at ffffffffa1275fe2
#6 [ffff911e15db7c60] bad_area_nosemaphore at ffffffffa1276104
#7 [ffff911e15db7c70] __do_page_fault at ffffffffa1990750
#8 [ffff911e15db7ce0] do_page_fault at ffffffffa1990975
#9 [ffff911e15db7d10] page_fault at ffffffffa198c778
[exception RIP: unknown or invalid address]
RIP: ffff911c3f6959e0 RSP: ffff911e15db7dc8 RFLAGS: 00010083
RAX: 0000000000000000 RBX: ffff911c3f6959e0 RCX: 0000000000000000
RDX: ffff911c3f6959a0 RSI: ffff911c3f6959f0 RDI: ffff911e15db7e80
RBP: ffffffffa12c9875 R8: 0000000000000000 R9: 0000000000000018
R10: 0000000062b72477 R11: 0000000000000246 R12: ffff911e15db7e80
R13: ffff911e15db7dd8 R14: ffff911c3f6959a0 R15: ffff911c3f6959a0
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#10 [ffff911e15db7de0] hrtimer_start_range_ns at ffffffffa12c9f3e
#11 [ffff911e15db7e48] do_nanosleep at ffffffffa198834b
#12 [ffff911e15db7e78] hrtimer_nanosleep at ffffffffa12cad0b
#13 [ffff911e15db7f18] sys_nanosleep at ffffffffa12cae66
#14 [ffff911e15db7f50] system_call_fastpath at ffffffffa1995f92
RIP: 00007f272626f8d0 RSP: 00007ffcb26e5398 RFLAGS: 00000202
RAX: 0000000000000023 RBX: 000055e6b5830330 RCX: 0000000000000001
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00007ffcb26e53e0
RBP: 00007ffcb26e5480 R8: 0000000000000000 R9: 0000000000000018
R10: 0000000062b72477 R11: 0000000000000246 R12: 00007f2724e1ceb0
R13: 0000000000000000 R14: 000000000df03106 R15: 0000000000000000
ORIG_RAX: 0000000000000023 CS: 0033 SS: 002b
The CPU’s RIP (Instruction Pointer) register should point to the next instruction to be executed.
It stores the offset address of the next instruction to be executed.
But in this crash instead of a valid offset address of the next instruction, the php referenced to an invalid RIP.
This kind of mis-behaviour is caused since the program “PHP” is generating the socket fault here by referencing to invalid instruction set.
We need to reach out to the PHP support with these details so that we can have this bug identified with specific PHP version and fix it accordingly.