Solved: Why Linux Server rebooted when all LUN paths failed ?

By Saket Jain Published July 9, 2021 Linux/Unix

Today in this article, we will see an interesting issue what i faced on one of the production server which got rebooted automatically during a SAN switch replacement activity.

Issue
One of our server was panic and rebooted automatically when all the LUN paths coming to the server was disconnected.

Root Cause
We found error message in the logs that a task was in hung state and its warning messages were generated multiple times.

Hence we have checked and found that hung task panic is enabled on server.

### hung_task_panic =1 in /etc/sysctl.conf, or /proc/sys/kernel/hung_task_panic contains the value 1
# sysctl -p | grep -i hung_task
kernel.hung_task_panic = 1

Hung task panic was enabled on server and when we took down all the paths on server, it panic the storage devices which in turn panic the system.

Solution
The kernel.hung_task_panic should be disabled on a production server, until and unless required for special situations where a problem is being diagnosed.

One another way is to set a limit on multipath device queueing so that it does not wait indefinitely for I/O and panic the kernel.

Usually when all LUNs become unavailable, and no_path_retry is set to high value like 300 then the processes in uninterruptiple sleep state waiting for these LUNs block for long time and causes a panic.

0 0 votes

Article Rating

Tags: centos hung_task_panic job kernel killed limit linux panic process rhel stopped task tasks unix

Vishu on How to create full size one partition using parted command in Linux ?: “Thanks a lot. This was exactly what I was looking for. Other blogs are very confusing but this worked for…” Jul 30, 23:26
cccc on Print only usernames from /etc/passwd file using grep, awk or cut commands.: “love it” Oct 18, 16:13
Saket Jain on How to configure and install Nagios Server on Linux ?: “Please check your system resolv.conf/DNS settings, it looks its not able to resolve the hostname. The URL is correct.” Jul 18, 13:37
deepanshu on How to configure and install Nagios Server on Linux ?: “[root@localhost nagios]# wget https://assets.nagios.com/downloads/nagioscore/releases/nagios-4.4.5.tar.gz –2023-07-02 19:15:08– https://assets.nagios.com/downloads/nagioscore/releases/nagios-4.4.5.tar.gz Resolving assets.nagios.com (assets.nagios.com)… failed: Name or service not known. wget: unable to resolve host…” Jul 3, 08:13
aasdasdKEKEK on Solved: subscription-manager – Not supported by a valid subscription.: “You Genius. How do we “verify if we have enough subscription available on redhat support to add this new server.”” May 27, 18:26

Solved: Why Linux Server rebooted when all LUN paths failed ?

Like this:

Related

TAGS

Share this NG Linux post:

Like this:

Related