Re: live patching design (was: Re: [PATCH 1/3] sched: add sched_task_call())

From: Ingo Molnar
Date: Sat Feb 21 2015 - 14:48:52 EST



* Jiri Kosina <jkosina@xxxxxxx> wrote:

> On Sat, 21 Feb 2015, Ingo Molnar wrote:
>
> > > But admittedly, if we reserve a special sort-of
> > > signal for making the tasks pass through a safe
> > > checkpoint (and make them queue there (your solution)
> > > or make them just pass through it and continue
> > > (current kGraft)), it might reduce the time this
> > > effort needs considerably.
> >
> > Well, I think the 'simple' method has another
> > advantage: it can only work if all those problems
> > (kthreads, parking machinery) are solved, because the
> > patching will occur only everything is quiescent.
> >
> > So no shortcuts are allowed, by design. It starts from
> > a fundamentally safe, robust base, while all the other
> > approaches I've seen were developed in a 'lets get the
> > patching to work, then iteratively try to make it
> > safer' which really puts the cart before the horse.
> >
> > So to put it more bluntly: I don't subscribe to the
> > whole 'consistency model' nonsense: that's just crazy
> > talk IMHO.
> >
> > Make it fundamentally safe from the very beginning, the
> > 'simple method' I suggested _won't live patch the
> > kernel_ if the mechanism has a bug and some kthread or
> > task does not park. See the difference?
>
> I see the difference, but I am afraid you are simplifying
> the situation a litle bit too much.
>
> There will always be properties of patches that will make
> them unapplicable in a "live patching" way by design.
> Think of data structure layout changes (*).

Yes.

> Or think of kernel that has some 3rd party vendor module
> loaded, and this module spawning a ktrehad that is not
> capable of parking itself.

The kernel will refuse to live patch until the module is
fixed. It is a win by any measure.

> Or think of patching __notrace__ functions.

Why would __notrace__ functions be a problem in the
'simple' method? Live patching with this method will work
even if ftrace is not built in, we can just patch out the
function in the simplest possible fashion, because we do it
atomically and don't have to worry about per task
'transition state' - like kGraft does.

It's a massive simplification and there's no need to rely
on ftrace's mcount trick. (Sorry Steve!)

> Etc.
>
> So it's not black and white, it's really a rather
> philosophical question where to draw the line (and make
> sure that everyone is aware of where the line is and what
> it means).

Out of the three examples you mentioned, two are actually
an advantage to begin with - so I'd suggest you stop
condescending me ...

> This is exactly why we came up with consistency models --
> it allows you to draw the line at well-defined places.

To still be blunt: they are snake oil, a bit like 'security
modules': allowing upstream merges by consensus between
competing pieces of crap, instead of coming up with a
single good design that we can call the Linux kernel live
patching method ...

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/