Re: please fix FUSION (Was: [v3.13][v3.14][Regression]kthread:makekthread_create()killable)

From: Tetsuo Handa
Date: Sat Mar 22 2014 - 02:25:29 EST


Oleg Nesterov wrote:
> Tetsuo, what do you think?

I don't like blocking SIGKILL while doing operations that depend on memory
allocation by other processes. If the OOM killer is triggered and it chose
the process blocking SIGKILL in mptsas_init() (I know it unlikely happens),
it generates the OOM killer deadlock.

My preference is to fix kthread_create() to handle SIGKILL gracefully.
kthread_create() did not return upon SIGKILL before commit 786235ee.
Since commit 786235ee, there is imbalance that "kmalloc(GFP_KERNEL) in
kthread_create_on_node() ignores SIGKILL unless TIF_MEMDIE is set" but
"wait_for_completion_killable() in kthread_create_on_node() does not ignore
SIGKILL even if TIF_MEMDIE is not set".

Many kernel operations (e.g. mutex_lock() wait_event() wait_for_completion())
ignore SIGKILL and the current users depend on SIGKILL being ignored. Thus,
commit 786235ee sounds like a kernel API breakage.
----------
>From 731f1f6dec7efaa132f751c432079b9b1fdb78d2 Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx>
Date: Sat, 22 Mar 2014 15:16:50 +0900
Subject: [PATCH] kthread: Handle SIGKILL gracefully in kthread_create().

Commit 786235ee "kthread: make kthread_create() killable" changed to
leave kthread_create() as soon as receiving SIGKILL. But this change
caused boot failures because systemd-udevd receives SIGKILL due to timeout
while loading SCSI controller drivers using finit_module() [1].

Therefore, this patch changes kthread_create() from "wait forever in
killable state" to "wait for 10 seconds in unkillable state, check for
the OOM killer every second".

This patch also changes the return value of timeout case from -ENOMEM
to -EINTR because -ENOMEM could make sense for only TIF_MEMDIE case.

[1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1276705

Signed-off-by: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx>
---
kernel/kthread.c | 37 +++++++++++++++++++++----------------
1 files changed, 21 insertions(+), 16 deletions(-)

diff --git a/kernel/kthread.c b/kernel/kthread.c
index b5ae3ee..6a25a9f 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -269,6 +269,7 @@ struct task_struct *kthread_create_on_node(int (*threadfn)(void *data),
const char namefmt[],
...)
{
+ int i = 0;
DECLARE_COMPLETION_ONSTACK(done);
struct task_struct *task;
struct kthread_create_info *create = kmalloc(sizeof(*create),
@@ -287,24 +288,28 @@ struct task_struct *kthread_create_on_node(int (*threadfn)(void *data),

wake_up_process(kthreadd_task);
/*
- * Wait for completion in killable state, for I might be chosen by
- * the OOM killer while kthreadd is trying to allocate memory for
- * new kernel thread.
+ * Wait for completion with 10 seconds timeout. Unless the kthreadd is
+ * blocked for direct memory reclaim path, the kthreadd will be able to
+ * complete the request within 10 seconds. Also, check every second if
+ * I was chosen by the OOM killer in order to avoid the OOM killer
+ * deadlock.
*/
- if (unlikely(wait_for_completion_killable(&done))) {
- /*
- * If I was SIGKILLed before kthreadd (or new kernel thread)
- * calls complete(), leave the cleanup of this structure to
- * that thread.
- */
- if (xchg(&create->done, NULL))
- return ERR_PTR(-ENOMEM);
- /*
- * kthreadd (or new kernel thread) will call complete()
- * shortly.
- */
- wait_for_completion(&done);
+ do {
+ if (likely(wait_for_completion_timeout(&done, HZ)))
+ goto ready;
+ } while (i++ < 10 && !test_thread_flag(TIF_MEMDIE));
+ /*
+ * The kthreadd was unable to complete the request within 10 seconds
+ * (or I was chosen by the OOM killer). Thus, give up and leave the
+ * cleanup of this structure to the kthreadd (or new kernel thread).
+ */
+ if (xchg(&create->done, NULL)) {
+ WARN(1, "Gave up waiting for kthreadd.\n");
+ return ERR_PTR(-EINTR);
}
+ /* kthreadd (or new kernel thread) will call complete() shortly. */
+ wait_for_completion(&done);
+ready:
task = create->result;
if (!IS_ERR(task)) {
static const struct sched_param param = { .sched_priority = 0 };
--
1.7.1
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/