Re: [RFC PATCH 0/2] kpatch: dynamic kernel patching

From: Masami Hiramatsu
Date: Tue May 06 2014 - 07:46:02 EST


(2014/05/06 3:43), Ingo Molnar wrote:
>
> * Frederic Weisbecker <fweisbec@xxxxxxxxx> wrote:
>
>> On Mon, May 05, 2014 at 08:26:38AM -0500, Josh Poimboeuf wrote:
>>> On Mon, May 05, 2014 at 10:55:37AM +0200, Ingo Molnar wrote:
>>>>
>>>> * Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote:
>>>>
>>>>> [...]
>>>>>
>>>>> kpatch checks the backtraces of all tasks in stop_machine() to
>>>>> ensure that no instances of the old function are running when the
>>>>> new function is applied. I think the biggest downside of this
>>>>> approach is that stop_machine() has to idle all other CPUs during
>>>>> the patching process, so it inserts a small amount of latency (a few
>>>>> ms on an idle system).
>>>>
>>>> When live patching the kernel, how about achieving an even 'cleaner'
>>>> state for all tasks in the system: to freeze all tasks, as the suspend
>>>> and hibernation code (and kexec) does, via freeze_processes()?
>>>>
>>>> That means no tasks in the system have any real kernel execution
>>>> state, and there's also no problem with long-sleeping tasks, as
>>>> freeze_processes() is supposed to be fast as well.
>>>>
>>>> I.e. go for the most conservative live patching state first, and relax
>>>> it only once the initial model is upstream and is working robustly.
>>>
>>> I had considered doing this before, but the problem I found is
>>> that many kernel threads are unfreezable. So we wouldn't be able
>>> to check whether its safe to replace any functions in use by those
>>> kernel threads.
>>
>> OTOH many kernel threads are parkable. Which achieves kind of
>> similar desired behaviour: the kernel threads then aren't running.
>>
>> And in fact we could implement freezing on top of park for kthreads.
>>
>> But unfortunately there are still quite some of them which don't
>> support parking.
>
> Well, if distros are moving towards live patching (and they are!),
> then it looks rather necessary to me that something scary as flipping
> out live kernel instructions with substantially different code should
> be as safe as possible, and only then fast.

Agreed.
At this point, I think we'd better take a safer way to live patch.

However, I also think if users can accept such freezing wait-time,
it means they can also accept kexec based "checkpoint-restart" patching.
So, I think the final goal of the kpatch will be live patching without
stopping the machine. I'm discussing the issue on github #138, but that is
off-topic. :)

> If a kernel refuses to patch with certain threads running, that will
> drive those kernel threads being fixed and such. It's a deterministic,
> recoverable, reportable bug situation, so fixing it should be fast.

That's nice to fix that. As Frederic said, we can make all kthreads
park-able.

> We learned these robustness lessons the hard way with kprobes and
> ftrace dynamic code patching... which are utterly simple compared to
> live kernel patching!

Yeah, thanks for your help :)

Thank you,

--
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Research Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@xxxxxxxxxxx


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/