What is hung task and how to panic our RHEL/centos system when a task remains hanged for specific period of time ?

The hung task is detected by linux kernel by parsing processes with uninterruptible sleep state(which are waiting for some event or resource and is usually not going to move forward) for long time and which are stalled into this D state.

In this article, we will see how to enable task panic for such hanged tasks and collect core dump for that task to troubleshoot later.

Here we assume that you already have kernel dump enabled on system and core collection working fine.

1. Enable Task Panic on Hung task(Temporarily)

# echo '1' > /proc/sys/kernel/hung_task_panic 

 

2. Enable Task Panic on Hung task(Permanently)

#### a. Open file to write new line
# vi /etc/sysctl.conf

#### b. Put below line to enable panic on hung task
kernel.hung_task_panic = 1

#### c. Now enable the mentioned parameter
# sysctl -p

 

3. Lets Check out all four hung parameters available
There are two ways to check this out.
The first one to grep from /proc files is preferred as it returns output even when there are few parameters not set.
However sysctl command will show null output in case no parameter is set.

# grep -Hv "zz" /proc/sys/kernel/hung*
/proc/sys/kernel/hung_task_check_count:4194304
/proc/sys/kernel/hung_task_panic:1
/proc/sys/kernel/hung_task_timeout_secs:240
/proc/sys/kernel/hung_task_warnings:15

# sysctl -q kernel | grep hung | sort
kernel.hung_task_check_count = 32768
kernel.hung_task_panic = 1
kernel.hung_task_timeout_secs = 240
kernel.hung_task_warnings = 15

 

4.  Understanding hung parameters

 

Parameter Usage
kernel.hung_task_check_count = 32768 Maximum number of 32768 processes to check on system
kernel.hung_task_panic = 1 Tells system to panic if tasks are blocked for more than hung_task_timeout_secs value
kernel.hung_task_timeout_secs = 240 A task is considered hung task when its not responding for 240 seconds here and a warning is issued.
kernel.hung_task_warnings = 15 Maximum number of warning for a hunged task, after which the task got panic and core dumped.

 

0 0 votes
Article Rating
Subscribe
Notify of
guest