Re: [PATCH] x86: Call fixup_exception() before notify_die() in math_error()

From: Siarhei Liakh
Date: Tue Jun 19 2018 - 15:56:31 EST


On Tue, 19 Jun 2018, Andy Lutomirski wrote: 

> On Jun 19, 2018, at 9:15 AM, Siarhei Liakh <Siarhei.Liakh@xxxxxxxxxxxxxxxxx> wrote:
>
> > On Mon, 18 Jun 2018, Andy Lutomirski wrote:
> >
> > > > On Thu, Jun 14, 2018 at 10:10 PM Siarhei Liakh
> > > > <Siarhei.Liakh@xxxxxxxxxxxxxxxxx> wrote:
> > > > >
> > > > > fpu__drop() has an explicit fwait which under some conditions can trigger
> > > > > a fixable FPU exception while in kernel. Thus, we should attempt to fixup
> > > > > the exception first, and only call notify_die() if the fixup failed just
> > > > > like in do_general_protection(). The original call sequence incorrectly
> > > > > triggers KDB entry on debug kernels under particular FPU-intensive
> > > > > workloads. This issue had been privately observed, fixed, and tested
> > > > > on 4.9.98, while this patch brings the fix to the upstream.
> > > >
> > > > Reviewed-by: Andy Lutomirski <luto@xxxxxxxxxx>
> > > >
> > > > With the caveat that you are perpetuating what is arguably a bug in
> > > > some of the other entries: math_error() can now be called with IRQs
> > > > off and return with IRQs on.  If we actually start asserting good
> > > > behavior in the entry code, we'll need to fix this.
> > >
> > > Confused. math_error() is still invoked with interrupts off. What's
> > > different now is that notify_die() is called with interrupts conditionally
> > > enabled while upstream it's always called with interrupts disabled.
> >
> > I see that notify_die() is being called either way in upstream (ex:
> > do_general_protection() and do_iret_error() vs do_bounds() and etc.).
> > Is there some some sort of general policy/guide documentation available
> > which outlines the expectations of notify_die(), as well as its notifiers?
>
> I doubt it.
>
> The right fix is to delete notify_die(), not to document it. kernel debuggers should
> hook die() directly, and other users (if any) should be moved into the error handlers.

Got it. Unfortunately, this looks like a whole separate code refactoring project
which I cannot undertake at this time. In the mean time, this patch offers a fix for
an immediate issue (KDB tripped when it shouldn't) even if it does nothing to
address the deficiencies in the framework itself.

Thank you.