Re: [PATCH] kexec: Fix reboot race during device_shutdown()
From: Joel Fernandes
Date: Tue Oct 10 2023 - 16:37:47 EST
On Mon, Oct 9, 2023 at 11:30 AM Eric W. Biederman <ebiederm@xxxxxxxxxxxx> wrote:
>
> Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> writes:
>
> > On Mon, Oct 2, 2023 at 2:18 PM Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote:
> > [..]
> >> > > Such freezing is already being done if kernel supports KEXEC_JUMP and
> >> > > kexec_image->preserve_context is true. However, doing it if either of these are
> >> > > not true prevents crashes/races.
> >> >
> >> > The KEXEC_JUMP case is something else entirely. It is supposed to work
> >> > like suspend to RAM. Maybe reboot should as well, but I am
> >> > uncomfortable making a generic device fix kexec specific.
> >>
> >> I see your point of view. I think regular reboot should also be fixed
> >> to avoid similar crash possibilities. I am happy to make a change for
> >> that similar to this patch if we want to proceed that way.
> >>
> >> Thoughts?
> >
> > Just checking how we want to proceed, is the consensus that we should
> > prevent kernel crashes without relying on userspace stopping all
> > processes? Should we fix regular reboot syscall as well and not just
> > kexec reboot?
>
> It just occurred to me there is something very fishy about all of this.
>
> What userspace do you have using kexec (not kexec on panic) that doesn't
> preform the same userspace shutdown as a normal reboot?
>
> Quite frankly such a userspace is buggy, and arguably that is where you
> should start fixing things.
It is a simple unit test that tests kexec support by kexec-rebooting
the kernel. I don't think SIGSTOP/SIGKILL'ing during kexec-reboot is
ideal because in a real panic-on-kexec type crash, that may not happen
and so does not emulate the real world that well. I think we want the
kexec-reboot to do a *reboot* without crashing the kernel while doing
so. Ricardo/Steve can chime on what they feel as well.
> That way you can get the orderly shutdown
> of userspace daemons/services along with an orderly shutdown of
> everything the kernel is responsible for.
Fixing in userspace is an option but people are not happy that the
kernel can crash like that.
> At the kernel level a kexec reboot and a normal reboot have been
> deliberately kept as close as possible. Which is why I say we should
> fix it in reboot.
You mean fix it in userspace?
thanks,
- Joel