Re: [RFC PATCH 0/2] kpatch: dynamic kernel patching

From: Josh Poimboeuf
Date: Tue May 06 2014 - 08:12:38 EST


On Mon, May 05, 2014 at 11:49:23PM +0200, Frederic Weisbecker wrote:
> On Mon, May 05, 2014 at 08:43:04PM +0200, Ingo Molnar wrote:
> > If a kernel refuses to patch with certain threads running, that will
> > drive those kernel threads being fixed and such. It's a deterministic,
> > recoverable, reportable bug situation, so fixing it should be fast.
> >
> > We learned these robustness lessons the hard way with kprobes and
> > ftrace dynamic code patching... which are utterly simple compared to
> > live kernel patching!
>
> Yeah, agreed. More rationale behind: we want to put the kthreads into
> semantic sleeps, not just random sleeping point. This way we lower the
> chances to execute new code messing up living state that is expecting old
> code after random preemption or sleeping points.
>
> But by semantic sleeps I mean more than just explicit calls to schedule()
> as opposed to preemption points.
> It also implies shutting down as well the living states handled by the kthread
> such that some sort of re-initialization of the state is also needed when
> the kthread gets back to run.
>
> And that's exactly what good implementations of kthread park provide.
>
> Consider kernel/watchdog.c as an example: when we park the lockup
> detector kthread, it disables the perf event and the hrtimer before it goes
> to actually park and sleep. When the kthread is later unparked, the kthread
> restarts the hrtimer and the perf event.
>
> If we live patch code that has obscure relations with perf or hrtimer here,
> we lower a lot the chances for a crash when the watchdog kthread is parked.
>
> So I'm in favour of parking all possible kthreads before live patching. Freezing
> alone doesn't provide the same state shutdown than parking.
>
> Now since parking looks more widely implemented than kthread freezing, we could
> even think about implementing kthread freezing using parking as backend.

The vast majority of kernel threads on my system don't seem to know
anything about parking or freezing. I see one kthread function which
calls kthread_should_park(), which is smpboot_thread_fn(), used for
ksoftirqd/*, migration/* and watchdog/*. But there are many other
kthread functions which seem to be parking ignorant, including:

cpu_idle_loop
kthreadd
rcu_gp_kthread
worker_thread
rescuer_thread
devtmpfsd
hub_thread
kswapd
ksm_scan_thread
khugepaged
fsnotify_mark_destroy
scsi_error_handler
kauditd_thread
kjournald2
irq_thread
rfcomm_run

Maybe we could modify all these thread functions (and probably more) to
be park and/or freezer capable. But really it wouldn't make much of a
difference IMO. It would only protect careless users from a tiny
percentage of all possible havoc that a careless user could create.

Live patching is a very sensitive and risky operation, and from a kernel
standpoint we should make it as safe as we reasonably can. But we can't
do much about careless users. Ultimately the risk is in the hands of
the user and their choice of patches. They need to absolutely
understand all the implications of patching a particular function. If
the patch changes the way a function interacts with some external data,
then they're starting to tempt fate and they need to be extra careful.
This care needs to be taken for *all* kernel functions, not just for the
few that are called from kernel threads.

Also, the top level kernel thread functions (like those listed above)
will never be patchable anyway, because we never patch an in-use
function (these functions are always in the threads' backtraces). This
further diminishes the benefit of parking/freezing kernel threads.

--
Josh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/