How to analyze memory in kernel crash dump on linux ?
Today in this article, we will look how to analyze a core file generated after the system crash.
The article is very useful to extract some of the basic information from the core file and get memory analysis of the server.
Lets see how to get different memory data/information using different keywords.
1. System Information
crash> sys
KERNEL: /cores/retrace/repos/kernel/x86_64/usr/lib/debug/lib/modules/3.10.0-693.33.1.rt56.621.el6rt.x86_64/vmlinux
DUMPFILE: /cores/retrace/tasks/747611042/crash/vmcore [PARTIAL DUMP]
CPUS: 40
DATE: Tue Jun 15 09:44:35 EDT 2021
UPTIME: 18 days, 04:16:22
LOAD AVERAGE: 1.53, 1.53, 1.51
TASKS: 797
NODENAME: ngelinux002
RELEASE: 3.10.0-693.33.1.rt56.621.el6rt.x86_64 <<<<<<<<<<<<< RT kernel
VERSION: #1 SMP PREEMPT RT Fri May 25 11:46:21 EDT 2018 <<<< kernel build date
MACHINE: x86_64 (2597 Mhz)
MEMORY: 63.9 GB
PANIC: "SysRq : Trigger a crash" <<<<<<< Manually panicked.
2. Get System Vendor/Hardware Information
crash> sys -i
DMI_BIOS_VENDOR: HP
DMI_BIOS_VERSION: P89
DMI_BIOS_DATE: 03/25/2019
DMI_SYS_VENDOR: HP
DMI_PRODUCT_NAME: ProLiant DL360 Gen9
DMI_PRODUCT_VERSION:
DMI_PRODUCT_SERIAL: CZJ51309LZ
DMI_PRODUCT_UUID: 32353537-3835-5A43-4A35-314430394C5D
DMI_BOARD_VENDOR: HP
DMI_BOARD_NAME: ProLiant DL360 Gen9
DMI_BOARD_VERSION:
DMI_BOARD_SERIAL: CZJ51309LZ
DMI_BOARD_ASSET_TAG:
DMI_CHASSIS_VENDOR: HP
DMI_CHASSIS_TYPE: 23
DMI_CHASSIS_VERSION:
DMI_CHASSIS_SERIAL: CZJ51309LZ
DMI_CHASSIS_ASSET_TAG:
3. Memory statistics.
crash> kmem -i
PAGES TOTAL PERCENTAGE
TOTAL MEM 16419806 62.6 GB ----
FREE 1367564 5.2 GB 8% of TOTAL MEM
USED 15052242 57.4 GB 91% of TOTAL MEM <<<<<< 57.4G memory was used at the time of crash
SHARED 207840 811.9 MB 1% of TOTAL MEM
BUFFERS 173508 677.8 MB 1% of TOTAL MEM
CACHED 102194 399.2 MB 0% of TOTAL MEM
SLAB 5661516 21.6 GB 34% of TOTAL MEM <<<<< 21.6G was used in SLAB
TOTAL HUGE 0 0 ----
HUGE FREE 0 0 0% of TOTAL HUGE <<< nothing at HUGEPAGE
TOTAL SWAP 4194303 16 GB ----
SWAP USED 0 0 0% of TOTAL SWAP
SWAP FREE 4194303 16 GB 100% of TOTAL SWAP
COMMIT LIMIT 12404206 47.3 GB ----
COMMITTED 318538 1.2 GB 2% of TOTAL LIMIT
4. Get unreclaimable memory
crash> kmem -V|grep -i slab
NR_SLAB_RECLAIMABLE: 132533
NR_SLAB_UNRECLAIMABLE: 5528983 <<<<< ~21G is unreclaimable
SLABS_SCANNED: 1852416
DROP_SLAB: 3
~~~
5528983*4
22115932
./1024/1024
21.09139633178710937500 <<<< in GiB, unreclaimable SLAB ~~~ --> task_struct slab is at 11G
5. Checkout the highest memory consuming processes.
crash> kmem -s | awk '{print $1,$2, $5*$6"k", $7}' | sort -nrk3 | column -t | head ffff88085f55a100 6616 11363072k task_struct <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< ffff88085f403700 256 2702328k kmalloc-256 ffff88085f55ad00 368 2271744k filp ffff88085f55a400 1280 1815840k signal_cache ffff88085f55a600 1048 1624224k mm_struct ffff88085f55a200 832 1165984k task_xstate ffff88085f403800 192 442256k kmalloc-192 ffff88085f55b100 160 222952k sigqueue ffff88085f403900 128 202520k kmalloc-128 ffff8810358a4500 944 153312k ext3_inode_cache
6. Get No tainted modules details.
crash> mod -t no tainted modules -- crash> sys -t TAINTED_MASK: 0
7. Process states: Get Zombie processes.
crash> ps -S RU: 42 IN: 755
As we can see above, No zombie processes are here.
8. Get total tasks count.
crash> ps|wc -l 798
Hence the total task count is 797.
9. Total RSS usage of user-mode.
crash> ps -u -G | sed 's/>//g' | awk '{ total += $8 } END { printf "Total RSS of user-mode: %.02f GiB\n", total/2^20 }'\
Total RSS of user-mode: 0.30 GiB
crash> ps -Gu | sed 's/^>//' | awk '{ m[$9]+=$8/1024 } END { for (item in m) { printf "%20s %10s MiB\n", item, m[item] } }' | sort -k 2 -r -n|head
jamaicavm_bin_m 115.355 MiB
puppet 59.5195 MiB
BESClient 11.5625 MiB
sshd 11.2891 MiB
durability 7.0625 MiB
monman 7.05859 MiB
networking 6.6875 MiB
ssmagent.bin 6.55469 MiB
rsyslogd 5.23438 MiB
batchman 5.00781 MiB
Hence the total RSS usage of user-mode is less than 1 GiB here.
10. Check allocation of kernel threads.
crash> ps -k -G | sed 's/>//g' | awk '{ total += $8 } END { printf "Total RSS of user-mode: %.02f GiB\n", total/2^20 }'
Total RSS of user-mode: 0.00 GiB
As we can see above, nothing is allocated at kernel threads.
11. Check out space allocated in pagetable, kernel stack , and dirt pages.
crash> kmem -V|grep -e NR_PAGETABLE -e NR_KERNEL_STACK -e NR_FILE_DIRTY
NR_FILE_DIRTY: 12595 <<<<<< less than 50MiB
NR_PAGETABLE: 1509 <<<<<<< less than 6MiB
NR_KERNEL_STACK: 1420330 <<<< ~5.4GiB
~~
~~~
1420330*4
5681320
./1024/1024
5.41812896728515625000 <<<<
As we can see above, around 5.5 GiB is used in pagetable, kernel stack and dirt pages.
12. Check out number of hugepages.
crash> kmem -h
HSTATE SIZE FREE TOTAL NAME
ffffffff81c368a0 2MB 0 0 hugepages-2048kB
~~~
As we can see above, zero/no hugepages exists.
13. Check cgroup page structure size.
crash> log|grep -e page_cgroup -e crash [ 0.000000] Reserving 164MB of memory at 704MB for crashkernel (System RAM: 65410MB) <<<< crashkernel=164M -- [ 0.000000] allocated 1073741824 bytes of page_cgroup <<<< 1GiB in page_cgroup
As we can see above, only 1GiB is used by ‘page_cgroup’ struct.
14. Hence the summation of accounted memory in above analysis is as follows:
Slab: 21.6G Cache+Buffer+Shared 1.8G Hugepage 0 kernelstack+pagetables+dirty 5.5G page_cgroup 1G no zombie no allocation to kernel tasks Process RSS of userspace 0.3G ----------------- Used 30.2G Total 57.4G Unaccounted memory is ~27.2 GiB
