Re: live patching design (was: Re: [PATCH 1/3] sched: add sched_task_call())

From: Ingo Molnar
Date: Sat Feb 21 2015 - 13:30:16 EST

Next message: Florian Westphal: "Re: 1e918876 breaks r8169 (linux-3.18+)"
Previous message: Jonathan Cameron: "Re: [PATCH 2/2] iio: accel: kxcjk-1013: optimize i2c transfers in trigger handler"
In reply to: Josh Poimboeuf: "Re: live patching design (was: Re: [PATCH 1/3] sched: add sched_task_call())"
Next in thread: Jiri Kosina: "Re: live patching design (was: Re: [PATCH 1/3] sched: add sched_task_call())"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

* Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote:

> On Fri, Feb 20, 2015 at 10:46:13PM +0100, Vojtech Pavlik wrote:
> > On Fri, Feb 20, 2015 at 08:49:01PM +0100, Ingo Molnar wrote:
> >
> > > I.e. it's in essence the strong stop-all atomic
> > > patching model of 'kpatch', combined with the
> > > reliable avoidance of kernel stacks that 'kgraft'
> > > uses.
> >
> > > That should be the starting point, because it's the
> > > most reliable method.
> >
> > In the consistency models discussion, this was marked
> > the "LEAVE_KERNEL+SWITCH_KERNEL" model. It's indeed the
> > strongest model of all, but also comes at the highest
> > cost in terms of impact on running tasks. It's so high
> > (the interruption may be seconds or more) that it was
> > deemed not worth implementing.
>
> Yeah, this is way too disruptive to the user.
>
> Even the comparatively tiny latency caused by kpatch's
> use of stop_machine() was considered unacceptable by
> some.

Unreliable, unrobust patching is even more disruptive...

What I think makes it long term fragile is that we combine
two unrobust, unlikely mechanisms: the chance that a task
just happens to execute a patched function, with the chance
that debug information is unreliable.

For example tracing patching got debugged to a fair degree
because we rely on the patching for actual tracing
functionality. Even with that relatively robust usage model
we had our crises ...

I just don't see how a stack backtrace based live patching
method can become robust in the long run.

> Plus a lot of processes would see EINTR, causing more
> havoc.

Parking threads safely in user mode does not require the
propagation of syscall interruption to user-space.

(It does have some other requirements, such as making all
syscalls interruptible to a 'special' signalling method
that only live patching triggers - even syscalls that are
under the normal ABI uninterruptible, such as sys_sync().)

On the other hand, if it's too slow, people will work on
improving signal propagation latencies: making syscalls
more readily interruptible and more seemlessly restartable
has various other advantages beyond live kernel patching.

I.e. it's a win-win scenario and will improve various areas
of the kernel in terms of syscall interruptability
latencies.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Florian Westphal: "Re: 1e918876 breaks r8169 (linux-3.18+)"
Previous message: Jonathan Cameron: "Re: [PATCH 2/2] iio: accel: kxcjk-1013: optimize i2c transfers in trigger handler"
In reply to: Josh Poimboeuf: "Re: live patching design (was: Re: [PATCH 1/3] sched: add sched_task_call())"
Next in thread: Jiri Kosina: "Re: live patching design (was: Re: [PATCH 1/3] sched: add sched_task_call())"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]