How to analyze memory in kernel crash dump on linux ?

Today in this article, we will look how to analyze a core file generated after the system crash.

The article is very useful to extract some of the basic information from the core file and get memory analysis of the server.

Lets see how to get different memory data/information using different keywords.

1. System Information

crash> sys
      KERNEL: /cores/retrace/repos/kernel/x86_64/usr/lib/debug/lib/modules/3.10.0-693.33.1.rt56.621.el6rt.x86_64/vmlinux
    DUMPFILE: /cores/retrace/tasks/747611042/crash/vmcore  [PARTIAL DUMP]
        CPUS: 40
        DATE: Tue Jun 15 09:44:35 EDT 2021
      UPTIME: 18 days, 04:16:22
LOAD AVERAGE: 1.53, 1.53, 1.51
       TASKS: 797
    NODENAME: ngelinux002
     RELEASE: 3.10.0-693.33.1.rt56.621.el6rt.x86_64   <<<<<<<<<<<<< RT kernel 
     VERSION: #1 SMP PREEMPT RT Fri May 25 11:46:21 EDT 2018  <<<< kernel build date
     MACHINE: x86_64  (2597 Mhz)
      MEMORY: 63.9 GB
       PANIC: "SysRq : Trigger a crash"  <<<<<<< Manually panicked.

 

2. Get System Vendor/Hardware Information

crash> sys -i
        DMI_BIOS_VENDOR: HP
       DMI_BIOS_VERSION: P89
          DMI_BIOS_DATE: 03/25/2019
         DMI_SYS_VENDOR: HP
       DMI_PRODUCT_NAME: ProLiant DL360 Gen9
    DMI_PRODUCT_VERSION: 
     DMI_PRODUCT_SERIAL: CZJ51309LZ
       DMI_PRODUCT_UUID: 32353537-3835-5A43-4A35-314430394C5D
       DMI_BOARD_VENDOR: HP
         DMI_BOARD_NAME: ProLiant DL360 Gen9
      DMI_BOARD_VERSION: 
       DMI_BOARD_SERIAL: CZJ51309LZ
    DMI_BOARD_ASSET_TAG:         
     DMI_CHASSIS_VENDOR: HP
       DMI_CHASSIS_TYPE: 23
    DMI_CHASSIS_VERSION: 
     DMI_CHASSIS_SERIAL: CZJ51309LZ
  DMI_CHASSIS_ASSET_TAG:     

 

3. Memory statistics.

crash> kmem -i
                 PAGES        TOTAL      PERCENTAGE
    TOTAL MEM  16419806      62.6 GB         ----
         FREE  1367564       5.2 GB    8% of TOTAL MEM
         USED  15052242      57.4 GB   91% of TOTAL MEM  <<<<<< 57.4G memory was used at the time of crash

       SHARED   207840     811.9 MB    1% of TOTAL MEM 
      BUFFERS   173508     677.8 MB    1% of TOTAL MEM	
       CACHED   102194     399.2 MB    0% of TOTAL MEM 

         SLAB  5661516      21.6 GB   34% of TOTAL MEM  <<<<< 21.6G was used in SLAB 

   TOTAL HUGE        0            0         ----
    HUGE FREE        0            0    0% of TOTAL HUGE  <<< nothing at HUGEPAGE

   TOTAL SWAP  4194303        16 GB         ----
    SWAP USED        0            0    0% of TOTAL SWAP
    SWAP FREE  4194303        16 GB  100% of TOTAL SWAP

 COMMIT LIMIT  12404206      47.3 GB         ----
    COMMITTED   318538       1.2 GB    2% of TOTAL LIMIT

 

4. Get unreclaimable memory

crash> kmem -V|grep -i slab
          NR_SLAB_RECLAIMABLE: 132533
        NR_SLAB_UNRECLAIMABLE: 5528983  <<<<< ~21G is unreclaimable
                SLABS_SCANNED: 1852416
                    DROP_SLAB: 3

~~~
5528983*4
22115932
./1024/1024
21.09139633178710937500  <<<< in GiB, unreclaimable SLAB ~~~ --> task_struct slab is at 11G

 

5. Checkout the highest memory consuming processes.

crash> kmem -s | awk '{print $1,$2, $5*$6"k", $7}' | sort -nrk3 | column -t | head ffff88085f55a100 6616 11363072k task_struct <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< ffff88085f403700 256 2702328k kmalloc-256 ffff88085f55ad00 368 2271744k filp ffff88085f55a400 1280 1815840k signal_cache ffff88085f55a600 1048 1624224k mm_struct ffff88085f55a200 832 1165984k task_xstate ffff88085f403800 192 442256k kmalloc-192 ffff88085f55b100 160 222952k sigqueue ffff88085f403900 128 202520k kmalloc-128 ffff8810358a4500 944 153312k ext3_inode_cache

 

6. Get No tainted modules details.

crash> mod -t
no tainted modules
--
crash> sys -t
TAINTED_MASK: 0 

 

7. Process states: Get Zombie processes.

crash> ps -S
  RU: 42
  IN: 755

As we can see above, No zombie processes are here.

 

8. Get total tasks count.

crash> ps|wc -l
798

Hence the total task count is 797.

 

9. Total RSS usage of user-mode.

crash> ps -u -G | sed 's/>//g' | awk '{ total += $8 } END { printf "Total RSS of user-mode: %.02f GiB\n", total/2^20 }'\
Total RSS of user-mode: 0.30 GiB


crash> ps -Gu | sed 's/^>//' | awk '{ m[$9]+=$8/1024 } END { for (item in m) { printf "%20s %10s MiB\n", item, m[item] } }' | sort -k 2 -r -n|head
     jamaicavm_bin_m    115.355 MiB
              puppet    59.5195 MiB
           BESClient    11.5625 MiB
                sshd    11.2891 MiB
          durability     7.0625 MiB
              monman    7.05859 MiB
          networking     6.6875 MiB
        ssmagent.bin    6.55469 MiB
            rsyslogd    5.23438 MiB
            batchman    5.00781 MiB

Hence the total RSS usage of user-mode is less than 1 GiB here.

 

10. Check allocation of kernel threads.

crash> ps -k -G | sed 's/>//g' | awk '{ total += $8 } END { printf "Total RSS of user-mode: %.02f GiB\n", total/2^20 }'
Total RSS of user-mode: 0.00 GiB

As we can see above, nothing is allocated at kernel threads.

 

11. Check out space allocated in pagetable, kernel stack , and dirt pages.

crash> kmem -V|grep -e NR_PAGETABLE -e NR_KERNEL_STACK -e NR_FILE_DIRTY

                NR_FILE_DIRTY: 12595  <<<<<< less than 50MiB
                 NR_PAGETABLE: 1509   <<<<<<< less than 6MiB
              NR_KERNEL_STACK: 1420330  <<<< ~5.4GiB
~~
~~~
1420330*4
5681320
./1024/1024
5.41812896728515625000  <<<<

As we can see above, around 5.5 GiB is used in pagetable, kernel stack and dirt pages.

 

12. Check out number of hugepages.

crash> kmem -h
     HSTATE        SIZE    FREE   TOTAL  NAME
ffffffff81c368a0    2MB       0       0  hugepages-2048kB
~~~
As we can see above, zero/no hugepages exists.

 

13. Check cgroup page structure size.

crash> log|grep -e page_cgroup -e crash 
[    0.000000] Reserving 164MB of memory at 704MB for crashkernel (System RAM: 65410MB)  <<<< crashkernel=164M
--
[    0.000000] allocated 1073741824 bytes of page_cgroup  <<<< 1GiB in page_cgroup

As we can see above, only 1GiB is used by ‘page_cgroup’ struct.

 

14. Hence the summation of accounted memory in above analysis is as follows:

Slab: 					21.6G
Cache+Buffer+Shared			1.8G
Hugepage				0
kernelstack+pagetables+dirty		5.5G
page_cgroup				1G
no zombie
no allocation to kernel tasks
Process RSS of userspace		0.3G
				-----------------
Used					30.2G
Total                                   57.4G

Unaccounted memory is 			~27.2 GiB

Leave a Reply

Your email address will not be published.