How to analyze memory in kernel crash dump on linux ?
Today in this article, we will look how to analyze a core file generated after the system crash.
The article is very useful to extract some of the basic information from the core file and get memory analysis of the server.
Lets see how to get different memory data/information using different keywords.
1. System Information
crash> sys KERNEL: /cores/retrace/repos/kernel/x86_64/usr/lib/debug/lib/modules/3.10.0-693.33.1.rt56.621.el6rt.x86_64/vmlinux DUMPFILE: /cores/retrace/tasks/747611042/crash/vmcore [PARTIAL DUMP] CPUS: 40 DATE: Tue Jun 15 09:44:35 EDT 2021 UPTIME: 18 days, 04:16:22 LOAD AVERAGE: 1.53, 1.53, 1.51 TASKS: 797 NODENAME: ngelinux002 RELEASE: 3.10.0-693.33.1.rt56.621.el6rt.x86_64 <<<<<<<<<<<<< RT kernel VERSION: #1 SMP PREEMPT RT Fri May 25 11:46:21 EDT 2018 <<<< kernel build date MACHINE: x86_64 (2597 Mhz) MEMORY: 63.9 GB PANIC: "SysRq : Trigger a crash" <<<<<<< Manually panicked.
2. Get System Vendor/Hardware Information
crash> sys -i DMI_BIOS_VENDOR: HP DMI_BIOS_VERSION: P89 DMI_BIOS_DATE: 03/25/2019 DMI_SYS_VENDOR: HP DMI_PRODUCT_NAME: ProLiant DL360 Gen9 DMI_PRODUCT_VERSION: DMI_PRODUCT_SERIAL: CZJ51309LZ DMI_PRODUCT_UUID: 32353537-3835-5A43-4A35-314430394C5D DMI_BOARD_VENDOR: HP DMI_BOARD_NAME: ProLiant DL360 Gen9 DMI_BOARD_VERSION: DMI_BOARD_SERIAL: CZJ51309LZ DMI_BOARD_ASSET_TAG: DMI_CHASSIS_VENDOR: HP DMI_CHASSIS_TYPE: 23 DMI_CHASSIS_VERSION: DMI_CHASSIS_SERIAL: CZJ51309LZ DMI_CHASSIS_ASSET_TAG:
3. Memory statistics.
crash> kmem -i PAGES TOTAL PERCENTAGE TOTAL MEM 16419806 62.6 GB ---- FREE 1367564 5.2 GB 8% of TOTAL MEM USED 15052242 57.4 GB 91% of TOTAL MEM <<<<<< 57.4G memory was used at the time of crash SHARED 207840 811.9 MB 1% of TOTAL MEM BUFFERS 173508 677.8 MB 1% of TOTAL MEM CACHED 102194 399.2 MB 0% of TOTAL MEM SLAB 5661516 21.6 GB 34% of TOTAL MEM <<<<< 21.6G was used in SLAB TOTAL HUGE 0 0 ---- HUGE FREE 0 0 0% of TOTAL HUGE <<< nothing at HUGEPAGE TOTAL SWAP 4194303 16 GB ---- SWAP USED 0 0 0% of TOTAL SWAP SWAP FREE 4194303 16 GB 100% of TOTAL SWAP COMMIT LIMIT 12404206 47.3 GB ---- COMMITTED 318538 1.2 GB 2% of TOTAL LIMIT
4. Get unreclaimable memory
crash> kmem -V|grep -i slab NR_SLAB_RECLAIMABLE: 132533 NR_SLAB_UNRECLAIMABLE: 5528983 <<<<< ~21G is unreclaimable SLABS_SCANNED: 1852416 DROP_SLAB: 3 ~~~ 5528983*4 22115932 ./1024/1024 21.09139633178710937500 <<<< in GiB, unreclaimable SLAB ~~~ --> task_struct slab is at 11G
5. Checkout the highest memory consuming processes.
crash> kmem -s | awk '{print $1,$2, $5*$6"k", $7}' | sort -nrk3 | column -t | head ffff88085f55a100 6616 11363072k task_struct <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< ffff88085f403700 256 2702328k kmalloc-256 ffff88085f55ad00 368 2271744k filp ffff88085f55a400 1280 1815840k signal_cache ffff88085f55a600 1048 1624224k mm_struct ffff88085f55a200 832 1165984k task_xstate ffff88085f403800 192 442256k kmalloc-192 ffff88085f55b100 160 222952k sigqueue ffff88085f403900 128 202520k kmalloc-128 ffff8810358a4500 944 153312k ext3_inode_cache
6. Get No tainted modules details.
crash> mod -t no tainted modules -- crash> sys -t TAINTED_MASK: 0
7. Process states: Get Zombie processes.
crash> ps -S RU: 42 IN: 755
As we can see above, No zombie processes are here.
8. Get total tasks count.
crash> ps|wc -l 798
Hence the total task count is 797.
9. Total RSS usage of user-mode.
crash> ps -u -G | sed 's/>//g' | awk '{ total += $8 } END { printf "Total RSS of user-mode: %.02f GiB\n", total/2^20 }'\ Total RSS of user-mode: 0.30 GiB crash> ps -Gu | sed 's/^>//' | awk '{ m[$9]+=$8/1024 } END { for (item in m) { printf "%20s %10s MiB\n", item, m[item] } }' | sort -k 2 -r -n|head jamaicavm_bin_m 115.355 MiB puppet 59.5195 MiB BESClient 11.5625 MiB sshd 11.2891 MiB durability 7.0625 MiB monman 7.05859 MiB networking 6.6875 MiB ssmagent.bin 6.55469 MiB rsyslogd 5.23438 MiB batchman 5.00781 MiB
Hence the total RSS usage of user-mode is less than 1 GiB here.
10. Check allocation of kernel threads.
crash> ps -k -G | sed 's/>//g' | awk '{ total += $8 } END { printf "Total RSS of user-mode: %.02f GiB\n", total/2^20 }' Total RSS of user-mode: 0.00 GiB
As we can see above, nothing is allocated at kernel threads.
11. Check out space allocated in pagetable, kernel stack , and dirt pages.
crash> kmem -V|grep -e NR_PAGETABLE -e NR_KERNEL_STACK -e NR_FILE_DIRTY NR_FILE_DIRTY: 12595 <<<<<< less than 50MiB NR_PAGETABLE: 1509 <<<<<<< less than 6MiB NR_KERNEL_STACK: 1420330 <<<< ~5.4GiB ~~ ~~~ 1420330*4 5681320 ./1024/1024 5.41812896728515625000 <<<<
As we can see above, around 5.5 GiB is used in pagetable, kernel stack and dirt pages.
12. Check out number of hugepages.
crash> kmem -h
HSTATE SIZE FREE TOTAL NAME
ffffffff81c368a0 2MB 0 0 hugepages-2048kB
~~~
As we can see above, zero/no hugepages exists.
13. Check cgroup page structure size.
crash> log|grep -e page_cgroup -e crash [ 0.000000] Reserving 164MB of memory at 704MB for crashkernel (System RAM: 65410MB) <<<< crashkernel=164M -- [ 0.000000] allocated 1073741824 bytes of page_cgroup <<<< 1GiB in page_cgroup
As we can see above, only 1GiB is used by ‘page_cgroup’ struct.
14. Hence the summation of accounted memory in above analysis is as follows:
Slab: 21.6G Cache+Buffer+Shared 1.8G Hugepage 0 kernelstack+pagetables+dirty 5.5G page_cgroup 1G no zombie no allocation to kernel tasks Process RSS of userspace 0.3G ----------------- Used 30.2G Total 57.4G Unaccounted memory is ~27.2 GiB