Re: [PATCH 0/2] vhost: improve livepatch switching for heavily loaded vhost worker kthreads

From: Seth Forshee
Date: Fri Jan 27 2023 - 15:05:15 EST


On Fri, Jan 27, 2023 at 11:37:02AM +0100, Peter Zijlstra wrote:
> On Thu, Jan 26, 2023 at 08:43:55PM -0800, Josh Poimboeuf wrote:
> > On Thu, Jan 26, 2023 at 03:12:35PM -0600, Seth Forshee (DigitalOcean) wrote:
> > > On Thu, Jan 26, 2023 at 06:03:16PM +0100, Petr Mladek wrote:
> > > > On Fri 2023-01-20 16:12:20, Seth Forshee (DigitalOcean) wrote:
> > > > > We've fairly regularaly seen liveptches which cannot transition within kpatch's
> > > > > timeout period due to busy vhost worker kthreads.
> > > >
> > > > I have missed this detail. Miroslav told me that we have solved
> > > > something similar some time ago, see
> > > > https://lore.kernel.org/all/20220507174628.2086373-1-song@xxxxxxxxxx/
> > >
> > > Interesting thread. I had thought about something along the lines of the
> > > original patch, but there are some ideas in there that I hadn't
> > > considered.
> >
> > Here's another idea, have we considered this? Have livepatch set
> > TIF_NEED_RESCHED on all kthreads to force them into schedule(), and then
> > have the scheduler call klp_try_switch_task() if TIF_PATCH_PENDING is
> > set.
> >
> > Not sure how scheduler folks would feel about that ;-)
>
> So, let me try and page all that back in.... :-)
>
> KLP needs to unwind the stack to see if any of the patched functions are
> active, if not, flip task to new set.
>
> Unwinding the stack of a task can be done when:
>
> - task is inactive (stable reg and stack) -- provided it stays inactive
> while unwinding etc..
>
> - task is current (guarantees stack doesn't dip below where we started
> due to being busy on top etc..)
>
> Can NOT be done from interrupt context, because can hit in the middle of
> setting up stack frames etc..
>
> The issue at hand is that some tasks run for a long time without passing
> through an explicit check.
>
> The thread above tried sticking something in cond_resched() which is a
> problem for PREEMPT=y since cond_resched() is a no-op.
>
> Preempt notifiers were raised, and those would actually be nice, except
> you can only install a notifier on current and you need some memory
> allocated per task, which makes it less than ideal. Plus ...
>
> ... putting something in finish_task_switch() wouldn't be the end of the
> world I suppose, but then you still need to force schedule the task --
> imagine it being the only runnable task on the CPU, there's nothing
> going to make it actually switch.
>
> Which then leads me to suggest something daft like this.. does that
> help?

The changes below are working well for me. The context has a read lock
on tasklist_lock so it can't sleep, so I'm using stop_one_cpu_nowait()
with per-CPU state.

Based on Josh's comments it sounds like the klp_have_reliable_stack()
check probably isn't quite right, and we might want to add something
else for PREEMPT+!ORC. But I wanted to go ahead and see if this approach
seems reasonable to everyone.

Thanks,
Seth


diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c
index f1b25ec581e0..9f3898f02828 100644
--- a/kernel/livepatch/transition.c
+++ b/kernel/livepatch/transition.c
@@ -9,6 +9,7 @@

#include <linux/cpu.h>
#include <linux/stacktrace.h>
+#include <linux/stop_machine.h>
#include "core.h"
#include "patch.h"
#include "transition.h"
@@ -334,9 +335,16 @@ static bool klp_try_switch_task(struct task_struct *task)
return !ret;
}

+static int __try_switch_kthread(void *arg)
+{
+ return klp_try_switch_task(arg) ? 0 : -EBUSY;
+}
+
+static DEFINE_PER_CPU(struct cpu_stop_work, klp_stop_work);
+
/*
* Sends a fake signal to all non-kthread tasks with TIF_PATCH_PENDING set.
- * Kthreads with TIF_PATCH_PENDING set are woken up.
+ * Kthreads with TIF_PATCH_PENDING set are preempted or woken up.
*/
static void klp_send_signals(void)
{
@@ -357,11 +365,22 @@ static void klp_send_signals(void)
* would be meaningless. It is not serious though.
*/
if (task->flags & PF_KTHREAD) {
- /*
- * Wake up a kthread which sleeps interruptedly and
- * still has not been migrated.
- */
- wake_up_state(task, TASK_INTERRUPTIBLE);
+ if (task_curr(task) && klp_have_reliable_stack()) {
+ /*
+ * kthread is currently running on a CPU; try
+ * to preempt it.
+ */
+ stop_one_cpu_nowait(task_cpu(task),
+ __try_switch_kthread,
+ task,
+ this_cpu_ptr(&klp_stop_work));
+ } else {
+ /*
+ * Wake up a kthread which sleeps interruptedly
+ * and still has not been migrated.
+ */
+ wake_up_state(task, TASK_INTERRUPTIBLE);
+ }
} else {
/*
* Send fake signal to all non-kthread tasks which are