Re: [PATCH] hung_task: Allow hung_task_panic when hung_task_warnings is 0.

From: Andrew Morton
Date: Fri Sep 09 2016 - 16:13:50 EST


On Fri, 9 Sep 2016 15:43:34 -0400 jsiddle@xxxxxxxxxx wrote:

> From: John Siddle <jsiddle@xxxxxxxxxx>
>
> Previously hung_task_panic would not be respected if enabled after
> hung_task_warnings had already been decremented to 0.
>
> Permit the kernel to panic if hung_task_panic is enabled after
> hung_task_warnings has already been decremented to 0 and another task
> hangs for hung_task_timeout_secs seconds.
>
> Check if hung_task_panic is enabled so we don't return prematurely, and
> check if hung_task_warnings is non-zero so we don't print the warning
> unnecessarily.
>
> ...
>
> --- a/kernel/hung_task.c
> +++ b/kernel/hung_task.c
> @@ -98,7 +98,7 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout)
>
> trace_sched_process_hang(t);
>
> - if (!sysctl_hung_task_warnings)
> + if (!sysctl_hung_task_warnings && !sysctl_hung_task_panic)
> return;
>
> if (sysctl_hung_task_warnings > 0)
> @@ -108,16 +108,18 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout)
> * Ok, the task did not get scheduled for more than 2 minutes,
> * complain:
> */
> - pr_err("INFO: task %s:%d blocked for more than %ld seconds.\n",
> - t->comm, t->pid, timeout);
> - pr_err(" %s %s %.*s\n",
> - print_tainted(), init_utsname()->release,
> - (int)strcspn(init_utsname()->version, " "),
> - init_utsname()->version);
> - pr_err("\"echo 0 > /proc/sys/kernel/hung_task_timeout_secs\""
> - " disables this message.\n");
> - sched_show_task(t);
> - debug_show_held_locks(t);
> + if (sysctl_hung_task_warnings) {
> + pr_err("INFO: task %s:%d blocked for more than %ld seconds.\n",
> + t->comm, t->pid, timeout);
> + pr_err(" %s %s %.*s\n",
> + print_tainted(), init_utsname()->release,
> + (int)strcspn(init_utsname()->version, " "),
> + init_utsname()->version);
> + pr_err("\"echo 0 > /proc/sys/kernel/hung_task_timeout_secs\""
> + " disables this message.\n");
> + sched_show_task(t);
> + debug_show_held_locks(t);
> + }

This introduces an off-by-one error. In the old code, if
sysctl_hung_task_warnings==1 on entry, we warn. With the new code, we
no longer warn.

This?

--- a/kernel/hung_task.c~hung_task-allow-hung_task_panic-when-hung_task_warnings-is-0-fix
+++ a/kernel/hung_task.c
@@ -101,14 +101,12 @@ static void check_hung_task(struct task_
if (!sysctl_hung_task_warnings && !sysctl_hung_task_panic)
return;

- if (sysctl_hung_task_warnings > 0)
- sysctl_hung_task_warnings--;
-
/*
* Ok, the task did not get scheduled for more than 2 minutes,
* complain:
*/
if (sysctl_hung_task_warnings) {
+ sysctl_hung_task_warnings--;
pr_err("INFO: task %s:%d blocked for more than %ld seconds.\n",
t->comm, t->pid, timeout);
pr_err(" %s %s %.*s\n",
_