Re: [PATCH 02/23] x86/fpu: Remove fpu->initialized usage in __fpu__restore_sig()

From: Sebastian Andrzej Siewior
Date: Fri Nov 09 2018 - 18:25:30 EST


On 2018-11-09 19:52:02 [+0100], Borislav Petkov wrote:
> On Fri, Nov 09, 2018 at 06:35:21PM +0100, Sebastian Andrzej Siewior wrote:
> > fpu__drop() stets ->initialized to 0. As a result the context switch
>
> "... the context switch path landing in switch_fpu_prepare()... " is what you
> mean, right?
I mean both. switch_fpu_prepare() while the task is leaving and then
switch_fpu_finish() while the task is coming back. But yes.

> > will not save current FPU registers and so _not_ write to fpu->state.
> > This also means that CPU's FPU register will be random (inherited from
> > the last context)
>
> You mean, the FPU regs will have random values, yes.
correct. Same like for kernel threads.

> > after the context switch. This is also true for usage
> > in softirq via kernel_fpu_begin().
>
> So far so good.
>
> Except maybe because I'm dense about FPU, I still am missing something.
>
> You have this path:
>
> __fpu__restore_sig
> |-> fpu__clear
> |-> fpu__drop
>
> and that happens on the sigreturn() path.
>
> Now, the context switch happens ... when exactly?
>
> After the sigreturn is done?

Is fpu__clear() correct here? If so, a context switch after setting
->initialized has been set to 1 wouldn't matter because in the end the
register state is restored from init_fpstate and not from task's FPU
struct.

>
> It must be because then you'd get that ->state corruption after
> ->initialized has been cleared.
>
> Right?

I might got your question wrong. If you quote the code and try again and
I do so, too :)

> <snip a bunch of stuff, we'll get back to it later>
>
> > So. The fix would be:
> > @@ -344,10 +344,10 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size)
> > sanitize_restored_xstate(tsk, &env, xfeatures, fx_only);
> > }
> >
> > + local_bh_disable();
> > fpu->initialized = 1;
> > - preempt_disable();
> > fpu__restore(fpu);
> > - preempt_enable();
> > + local_bh_enable();
> >
> > return err;
> > } else {
> >
> > local_bh_disable() due to possible kernel_fpu_begin() usage in softirq.
> > How much do we care here about a theoretical race on 32bit anyway? I
> > don't think someone complained :) I would have to rebase my queueâ
> > otherwiseâ
>
> Funny, you should mention that.
>
> But this very much rings a bell about a very elusive bug we had on
> 32-bit at the time. See attached mbox (yeah, the web archives were crap
> and couldn't find the links so I'm sending you the whole thread).
>
> And at the time Ingo said that there's something still missing about
> *why* it would happen.
>
> And I think it is this context switch happening right after the
> sigreturn - *AFAICT* - which would cause this.
>
> I could very well be off but this smells very similar to your thing.

So checking out v4.5-rc3-15-g58122bf1d856a and __fpu__restore_sig() is
something like this:

| fpu__drop(fpu);
â
| fpu->fpstate_active = 1;
X
| if (use_eager_fpu()) {
| preempt_disable();
| fpu__restore(fpu);
| preempt_enable();
| }

fpu__drop() sets fpstate_active & fpregs_active to 0[Â]. A context switch
at X would _not_ save current FPU registers and overwrite what was
prepared because fpregs_active should still be zero.
Now on the switch back to the task, fpstate_active was set which means
fpu.preload might be true. If so it would load the FPU registers and set
fpregs_active to 1. Later fpu__restore() would try the same and
fpregs_activate() would trigger the warning because fpregs_active was
already set to 1.

> Hmmm.
So I just came up with a possible hard to trigger case and a robot
triggered it already a while back. Well, CONFIG_PREEMPT=y is also there
so it matches this part of the story. But you connected the dots.

[Â] side note: in my early research it took a while to notice that
fpstate_active and fpregs_active were two different things. My brain
used fp.*_active for matching. It also helped my confusion that
those were renamed and removedâ

Sebastian