Re: [PATCH] x86: Call fixup_exception() before notify_die() in math_error()

From: Andy Lutomirski
Date: Tue Jun 19 2018 - 11:23:12 EST




> On Jun 18, 2018, at 11:23 PM, Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
>
>> On Mon, 18 Jun 2018, Andy Lutomirski wrote:
>>
>> On Thu, Jun 14, 2018 at 10:10 PM Siarhei Liakh
>> <Siarhei.Liakh@xxxxxxxxxxxxxxxxx> wrote:
>>>
>>> fpu__drop() has an explicit fwait which under some conditions can trigger
>>> a fixable FPU exception while in kernel. Thus, we should attempt to fixup
>>> the exception first, and only call notify_die() if the fixup failed just
>>> like in do_general_protection(). The original call sequence incorrectly
>>> triggers KDB entry on debug kernels under particular FPU-intensive
>>> workloads. This issue had been privately observed, fixed, and tested
>>> on 4.9.98, while this patch brings the fix to the upstream.
>>
>> Reviewed-by: Andy Lutomirski <luto@xxxxxxxxxx>
>>
>> With the caveat that you are perpetuating what is arguably a bug in
>> some of the other entries: math_error() can now be called with IRQs
>> off and return with IRQs on. If we actually start asserting good
>> behavior in the entry code, we'll need to fix this.
>
> Confused. math_error() is still invoked with interrupts off. What's
> different now is that notify_die() is called with interrupts conditionally
> enabled while upstream it's always called with interrupts disabled.

True, but I donât think that matters. What Iâm grumbling about is that we can do cond_local_irq_enable() and then return without local_irq_disable().

Anyway, I think the patch is fine as is. We can unsuck the entry IRQ handling another day.

>
> Thanks,
>
> tglx
>
>
>