Analyzing crash dump or vmcore file in linux step by step.

Today we will see one of the most useful article how to analyze the vmcore file in linux.

Lets see step by step how to achieve this.

1. Check the kernel version of or system.

[root@ngelinux001 127.0.0.1-2022-06-25-08:06:37]# uname -a
Linux ngelinux001 3.10.0-1160.42.2.el7.x86_64 #1 SMP Tue Sep 7 14:49:57 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

 

2. Check out if crash rpm installed or not.

[root@ngelinux001 127.0.0.1-2022-06-25-08:06:37]# rpm -qa | grep -i crash
crash-7.2.3-11.el7_9.1.x86_64

In case not installed, just install it using yum.

 

3. Install kernel Debuginfo packages specific to the kernel version.
Since my yum repo does not have these packages, i have downloaded them in my current directory and then ran this command.

[root@ngelinux001 ~]# yum install ./kernel-debuginfo-common-x86_64-3.10.0-1160.42.2.el7.x86_64.rpm ./kernel-debuginfo-3.10.0-1160.42.2.el7.x86_64.rpm
Loaded plugins: aliases, changelog, fastestmirror, kabi, langpacks, tmprepo, verify, versionlock
Loading support for Red Hat kernel ABI
Examining ./kernel-debuginfo-common-x86_64-3.10.0-1160.42.2.el7.x86_64.rpm: kernel-debuginfo-common-x86_64-3.10.0-1160.42.2.el7.x86_64
Marking ./kernel-debuginfo-common-x86_64-3.10.0-1160.42.2.el7.x86_64.rpm to be installed
Examining ./kernel-debuginfo-3.10.0-1160.42.2.el7.x86_64.rpm: kernel-debuginfo-3.10.0-1160.42.2.el7.x86_64
Marking ./kernel-debuginfo-3.10.0-1160.42.2.el7.x86_64.rpm to be installed
Resolving Dependencies
--> Running transaction check
---> Package kernel-debuginfo.x86_64 0:3.10.0-1160.42.2.el7 will be installed
---> Package kernel-debuginfo-common-x86_64.x86_64 0:3.10.0-1160.42.2.el7 will be installed
--> Finished Dependency Resolution

Dependencies Resolved

================================================================================================================================
 Package                        Arch   Version                Repository                                                   Size
================================================================================================================================
Installing:
 kernel-debuginfo               x86_64 3.10.0-1160.42.2.el7   /kernel-debuginfo-3.10.0-1160.42.2.el7.x86_64               2.1 G
 kernel-debuginfo-common-x86_64 x86_64 3.10.0-1160.42.2.el7   /kernel-debuginfo-common-x86_64-3.10.0-1160.42.2.el7.x86_64 294 M

Transaction Summary
================================================================================================================================
Install  2 Packages

Total size: 2.4 G
Installed size: 2.4 G
Is this ok [y/d/N]: y
Downloading packages:
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  Installing : kernel-debuginfo-common-x86_64-3.10.0-1160.42.2.el7.x86_64                                                   1/2
  Installing : kernel-debuginfo-3.10.0-1160.42.2.el7.x86_64 [################################                             ] 2/2

 

4. Now initiate crash command with collected vmcore inside crash dump and vmlinux file.

[root@ngelinux001 127.0.0.1-2022-06-25-08:06:37]# crash ./vmcore /usr/lib/debug/lib/modules/3.10.0-1160.42.2.el7.x86_64/vmlinux
crash>

 

5. Backtrack the crash scenario to identify what initiated the crash dump.

crash> bt
PID: 29187  TASK: ffff90fd72842100  CPU: 20  COMMAND: "php"
 #0 [ffff911e15db7a50] machine_kexec at ffffffffa12662c4
 #1 [ffff911e15db7ab0] __crash_kexec at ffffffffa1322a32
 #2 [ffff911e15db7b80] crash_kexec at ffffffffa1322b20
 #3 [ffff911e15db7b98] oops_end at ffffffffa198d798
 #4 [ffff911e15db7bc0] no_context at ffffffffa1275d14
 #5 [ffff911e15db7c10] __bad_area_nosemaphore at ffffffffa1275fe2
 #6 [ffff911e15db7c60] bad_area_nosemaphore at ffffffffa1276104
 #7 [ffff911e15db7c70] __do_page_fault at ffffffffa1990750
 #8 [ffff911e15db7ce0] do_page_fault at ffffffffa1990975
 #9 [ffff911e15db7d10] page_fault at ffffffffa198c778
    [exception RIP: unknown or invalid address]
    RIP: ffff911c3f6959e0  RSP: ffff911e15db7dc8  RFLAGS: 00010083
    RAX: 0000000000000000  RBX: ffff911c3f6959e0  RCX: 0000000000000000
    RDX: ffff911c3f6959a0  RSI: ffff911c3f6959f0  RDI: ffff911e15db7e80
    RBP: ffffffffa12c9875   R8: 0000000000000000   R9: 0000000000000018
    R10: 0000000062b72477  R11: 0000000000000246  R12: ffff911e15db7e80
    R13: ffff911e15db7dd8  R14: ffff911c3f6959a0  R15: ffff911c3f6959a0
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
#10 [ffff911e15db7de0] hrtimer_start_range_ns at ffffffffa12c9f3e
#11 [ffff911e15db7e48] do_nanosleep at ffffffffa198834b
#12 [ffff911e15db7e78] hrtimer_nanosleep at ffffffffa12cad0b
#13 [ffff911e15db7f18] sys_nanosleep at ffffffffa12cae66
#14 [ffff911e15db7f50] system_call_fastpath at ffffffffa1995f92
    RIP: 00007f272626f8d0  RSP: 00007ffcb26e5398  RFLAGS: 00000202
    RAX: 0000000000000023  RBX: 000055e6b5830330  RCX: 0000000000000001
    RDX: 0000000000000000  RSI: 0000000000000000  RDI: 00007ffcb26e53e0
    RBP: 00007ffcb26e5480   R8: 0000000000000000   R9: 0000000000000018
    R10: 0000000062b72477  R11: 0000000000000246  R12: 00007f2724e1ceb0
    R13: 0000000000000000  R14: 000000000df03106  R15: 0000000000000000
    ORIG_RAX: 0000000000000023  CS: 0033  SS: 002b
crash>

Above we can see the php command initiated the crash dump here and its PID 29187.

 

6. Check out the PID details.

crash> ps | grep -i 29187
> 29187      1  20  ffff90fd72842100  IN   0.0  399168  45176  php
  29650  29187  44  ffff910413cba100  IN   0.0  181088   4864  ssh

 

7. Check the memory details where it is used.

crash> kmem ffff90fd72842100
CACHE            NAME                 OBJSIZE  ALLOCATED     TOTAL  SLABS  SSIZE
ffff90fdbfdbe300 task_struct             4216       2056      2926    418    32k
  SLAB              MEMORY            NODE  TOTAL  ALLOCATED  FREE
  fffff87684ca1000  ffff90fd72840000     0      7          7     0
  FREE / [ALLOCATED]
  [ffff90fd72842100]

    PID: 29187
COMMAND: "php"
   TASK: ffff90fd72842100  [THREAD_INFO: ffff911e15db4000]
    CPU: 20
  STATE: TASK_INTERRUPTIBLE (PANIC)

      PAGE         PHYSICAL      MAPPING       INDEX CNT FLAGS
fffff87684ca1080  132842000                0        0  0 2fffff00008000 tail
crash>

 

8. Analysis Part

The system was crashed due to “php” command.
Below data is visible when we run crash command.

=====================
NODENAME: ngelinux001
RELEASE: 3.10.0-1160.42.2.el7.x86_64
PANIC: “BUG: unable to handle kernel paging request at ffff911c3f6959e0”
=====================
KERNEL: /usr/lib/debug/lib/modules/3.10.0-1160.42.2.el7.x86_64/vmlinux
DUMPFILE: ./vmcore [PARTIAL DUMP]
CPUS: 72
DATE: Sat Jun 25 08:06:31 2022
UPTIME: 22 days, 11:08:17
LOAD AVERAGE: 21.45, 23.17, 22.74
TASKS: 2052
NODENAME: ngelinux001
RELEASE: 3.10.0-1160.42.2.el7.x86_64
VERSION: #1 SMP Tue Sep 7 14:49:57 UTC 2021
MACHINE: x86_64 (2299 Mhz)
MEMORY: 255.9 GB
PANIC: “BUG: unable to handle kernel paging request at ffff911c3f6959e0”
PID: 29187
COMMAND: “php”
TASK: ffff90fd72842100 [THREAD_INFO: ffff911e15db4000]
CPU: 20
STATE: TASK_INTERRUPTIBLE (PANIC)

—No details for this memory reference—–
crash> kmem 0xffff911c3f6959e0
PAGE PHYSICAL MAPPING INDEX CNT FLAGS
fffff876fffda540 1fff695000 0 0 1 2fffff00000400 reserved

—No issues with kernel symbol referencing—–
crash> dis -rl 0xffff911c3f6959e0
dis: WARNING: ffff911c3f6959e0: no associated kernel symbol found
0xffff911c3f6959e0: movabs 0xffff911c3f6959,%al
crash>

crash> bt -v
No stack overflows detected

 

9. Crashed by php process which initiated the ssh session as child process.

> 29187 1 20 ffff90fd72842100 IN 0.0 399168 45176 php
29650 29187 44 ffff910413cba100 IN 0.0 181088 4864 ssh

PID PPID CPU TASK ST %MEM VSZ RSS COMM
> 29187 1 20 ffff90fd72842100 IN 0.0 399168 45176 php

 

10. The system gets crashed when this memory reference ffff910413cba100 i.e. ssh process initiated a tail command which referenced to an invalid memory location.

crash> kmem ffff90fd72842100
CACHE NAME OBJSIZE ALLOCATED TOTAL SLABS SSIZE
ffff90fdbfdbe300 task_struct 4216 2056 2926 418 32k
SLAB MEMORY NODE TOTAL ALLOCATED FREE
fffff87684ca1000 ffff90fd72840000 0 7 7 0
FREE / [ALLOCATED]
[ffff90fd72842100]

PID: 29187
COMMAND: “php”
TASK: ffff90fd72842100 [THREAD_INFO: ffff911e15db4000]
CPU: 20
STATE: TASK_INTERRUPTIBLE (PANIC)

PAGE PHYSICAL MAPPING INDEX CNT FLAGS
fffff87684ca1080 132842000 0 0 0 2fffff00008000 tail
crash>

 

11. Lets check out the backtrace of the process.

PID: 29187 TASK: ffff90fd72842100 CPU: 20 COMMAND: “php”
#0 [ffff911e15db7a50] machine_kexec at ffffffffa12662c4
#1 [ffff911e15db7ab0] __crash_kexec at ffffffffa1322a32
#2 [ffff911e15db7b80] crash_kexec at ffffffffa1322b20
#3 [ffff911e15db7b98] oops_end at ffffffffa198d798
#4 [ffff911e15db7bc0] no_context at ffffffffa1275d14
#5 [ffff911e15db7c10] __bad_area_nosemaphore at ffffffffa1275fe2
#6 [ffff911e15db7c60] bad_area_nosemaphore at ffffffffa1276104
#7 [ffff911e15db7c70] __do_page_fault at ffffffffa1990750
#8 [ffff911e15db7ce0] do_page_fault at ffffffffa1990975
#9 [ffff911e15db7d10] page_fault at ffffffffa198c778
[exception RIP: unknown or invalid address]
RIP: ffff911c3f6959e0 RSP: ffff911e15db7dc8 RFLAGS: 00010083
RAX: 0000000000000000 RBX: ffff911c3f6959e0 RCX: 0000000000000000
RDX: ffff911c3f6959a0 RSI: ffff911c3f6959f0 RDI: ffff911e15db7e80
RBP: ffffffffa12c9875 R8: 0000000000000000 R9: 0000000000000018
R10: 0000000062b72477 R11: 0000000000000246 R12: ffff911e15db7e80
R13: ffff911e15db7dd8 R14: ffff911c3f6959a0 R15: ffff911c3f6959a0
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#10 [ffff911e15db7de0] hrtimer_start_range_ns at ffffffffa12c9f3e
#11 [ffff911e15db7e48] do_nanosleep at ffffffffa198834b
#12 [ffff911e15db7e78] hrtimer_nanosleep at ffffffffa12cad0b
#13 [ffff911e15db7f18] sys_nanosleep at ffffffffa12cae66
#14 [ffff911e15db7f50] system_call_fastpath at ffffffffa1995f92
RIP: 00007f272626f8d0 RSP: 00007ffcb26e5398 RFLAGS: 00000202
RAX: 0000000000000023 RBX: 000055e6b5830330 RCX: 0000000000000001
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00007ffcb26e53e0
RBP: 00007ffcb26e5480 R8: 0000000000000000 R9: 0000000000000018
R10: 0000000062b72477 R11: 0000000000000246 R12: 00007f2724e1ceb0
R13: 0000000000000000 R14: 000000000df03106 R15: 0000000000000000
ORIG_RAX: 0000000000000023 CS: 0033 SS: 002b

The CPU’s RIP (Instruction Pointer) register should point to the next instruction to be executed.

It stores the offset address of the next instruction to be executed.

But in this crash instead of a valid offset address of the next instruction, the php referenced to an invalid RIP.

This kind of mis-behaviour is caused since the program “PHP” is generating the socket fault here by referencing to invalid instruction set.

We need to reach out to the PHP support with these details so that we can have this bug identified with specific PHP version and fix it accordingly.

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Newest
Oldest Most Voted
Inline Feedbacks
View all comments