Re: [PATCH v2] hung_task : check the value of "sysctl_hung_task_timeout_sec"

From: Liu hua
Date: Wed Mar 26 2014 - 03:58:38 EST


ä 2014/3/26 0:25, Satoru Takeuchi åé:
> At Tue, 25 Mar 2014 16:58:58 +0800,
> Liu hua wrote:
>>
>> ä 2014/3/24 4:50, Satoru Takeuchi åé:
>>> At Sun, 23 Mar 2014 15:54:04 +0800,
>>> Liu Hua wrote:
>>>>
>>>> As sysctl_hung_task_timeout_sec is unsigned long, when this value is
>>>> larger then LONG_MAX/HZ, the function schedule_timeout_interruptible in
>>>> watchdog will return immediately without sleep and with print :
>>>>
>>>> [ 205.452934] schedule_timeout: wrong timeout value ffffffffffffff83
>>>>
>>>> and then the funtion watchdog will call schedule_timeout_interruptible again
>>>> and again. The screen will be filled with
>>>> "schedule_timeout: wrong timeout value ffffffffffffff83"
>>>>
>>>> This patch does some check and correction in timeout_jiffies, to let the
>>>> function schedule_timeout_interruptible allways get the valid parameter.
>>>>
>>>> Cc: <stable@xxxxxxxxxxxxxxx>
>>>> Signed-off-by: Liu Hua <sdu.liu@xxxxxxxxxx>
>>>> ---
>>>> kernel/hung_task.c | 8 ++++++--
>>>> 1 file changed, 6 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/kernel/hung_task.c b/kernel/hung_task.c
>>>> index 6df6149..f992286 100644
>>>> --- a/kernel/hung_task.c
>>>> +++ b/kernel/hung_task.c
>>>> @@ -174,8 +174,12 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
>>>>
>>>> static unsigned long timeout_jiffies(unsigned long timeout)
>>>> {
>>>> - /* timeout of 0 will disable the watchdog */
>>>> - return timeout ? timeout * HZ : MAX_SCHEDULE_TIMEOUT;
>>>> + /* timeout of 0 or >= LONG_MAX/HZ will disable the watchdog */
>>>> + if ((timeout == 0) || (timeout > MAX_SCHEDULE_TIMEOUT))
>>>
>>> You should check whether sysctl_hung_task_timeout_sec > MAX_SCHEDULE_TIMEOUT/HZ
>>> or not when setting this parameter instead. Then this check ins't necessary here.
>>>
>>> # Just FYI, MAX_SCHEDULE_TIMEOUT should be MAX_SCHEDULE_TIMEOUT/HZ here.
>>>
>>> Thanks,
>>> Satoru
>>
>> Yes, how about this :
>
> I confirmed the followings.
>
> - 3.14-rc8: system hunged up with "hung_task_timeout_secs > LONG_MAX/HZ".
> - 3.14-rc8 with your patch: works fine. I can't set the above mentioned value any more.
>
> Writing possible values (0..LONG_MAX/HZ) in Documentation/sysctl/kernel.txt
> make this patch better.
>
> Thanks,
> Satoru

Thanks to you attention and suggestion. I remade this patch as following.
Is it appropriate to be reposted with tag "PATCH v3"

Subject: [PATCH v3] hung_task : check the value of "sysctl_hung_task_timeout_sec"

As sysctl_hung_task_timeout_sec is unsigned long, when this value is
larger then LONG_MAX/HZ, the function schedule_timeout_interruptible in
watchdog will return immediately without sleep and with print :

[ 205.452934] schedule_timeout: wrong timeout value ffffffffffffff83

and then the funtion watchdog will call schedule_timeout_interruptible
again and again. The screen will be filled with
"schedule_timeout: wrong timeout value ffffffffffffff83"

This patch does some check and correction in sysctl, to let the
function schedule_timeout_interruptible allways get the valid parameter.

Signed-off-by: Liu Hua <sdu.liu@xxxxxxxxxx>
Tested-by: Satoru Takeuchi <satoru.takeuchi@xxxxxxxxx>
---
Documentation/sysctl/kernel.txt | 1 +
kernel/sysctl.c | 6 ++++++
2 files changed, 7 insertions(+)

diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
index e55124e..855d9b3 100644
--- a/Documentation/sysctl/kernel.txt
+++ b/Documentation/sysctl/kernel.txt
@@ -317,6 +317,7 @@ for more than this value report a warning.
This file shows up if CONFIG_DETECT_HUNG_TASK is enabled.

0: means infinite timeout - no checking done.
+Possible values to set are in range {0..LONG_MAX/HZ}.

==============================================================

diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 49e13e1..aae21e8 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -144,6 +144,11 @@ static int min_percpu_pagelist_fract = 8;
static int ngroups_max = NGROUPS_MAX;
static const int cap_last_cap = CAP_LAST_CAP;

+/*this is needed for proc_doulongvec_minmax of sysctl_hung_task_timeout_secs */
+#ifdef CONFIG_DETECT_HUNG_TASK
+static unsigned long hung_task_timeout_max = (LONG_MAX/HZ);
+#endif
+
#ifdef CONFIG_INOTIFY_USER
#include <linux/inotify.h>
#endif
@@ -995,6 +1000,7 @@ static struct ctl_table kern_table[] = {
.maxlen = sizeof(unsigned long),
.mode = 0644,
.proc_handler = proc_dohung_task_timeout_secs,
+ .extra2 = &hung_task_timeout_max,
},
{
.procname = "hung_task_warnings",
--
1.9.0

Thanks,
Liu Hua

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/