Re: timer + fpu stuff locks up computer

From: Sergey Vlasov
Date: Sat Jun 12 2004 - 10:16:06 EST


On Sat, Jun 12, 2004 at 04:25:51PM +0200, Alexander Nyberg wrote:
> > --- linux-2.6.6/include/asm-i386/i387.h.fp-lockup 2004-05-10 06:33:06 +0400
> > +++ linux-2.6.6/include/asm-i386/i387.h 2004-06-12 17:25:56 +0400
> > @@ -51,7 +51,6 @@
> > #define __clear_fpu( tsk ) \
> > do { \
> > if ((tsk)->thread_info->status & TS_USEDFPU) { \
> > - asm volatile("fwait"); \
> > (tsk)->thread_info->status &= ~TS_USEDFPU; \
> > stts(); \
> > } \
>
> Sorry for this extremely informative mail but, doesn't work.
>
> Looks like the problem is only being delayed:
>
> Pid: 431, comm: sshd
> EIP: 0060:[<c0119f98>] CPU: 0
> EIP is at force_sig_info+0x48/0x80
> EFLAGS: 00000286 Not tainted (2.6.7-rc3-mm1)
> EAX: 00000000 EBX: de96d7d0 ECX: 00000007 EDX: 00000008
> ESI: 00000008 EDI: 00000286 EBP: de9e3dd4 DS: 007b ES: 007b
> CR0: 8005003b CR2: 080b2664 CR3: 1f48f000 CR4: 000002d0
> [<c0105560>] do_coprocessor_error+0x0/0x20
> [<c01054f2>] math_error+0xb2/0x120
> [<c01d2bb8>] fast_clear_page+0x8/0x50
...

Grrr. I was testing on a fairly generic kernel configuration which
did not include fast_clear_page()...

If the FPU state belong to the userspace process, kernel_fpu_begin()
is safe even if some exceptions are pending. However, after
__clear_fpu() the FPU is "orphaned", and kernel_fpu_begin() does
nothing with it.

Replacing fwait with fnclex instead of removing it completely should
avoid the fault later. However, looks like we really need the proper
fix - teach do_coprocessor_error() to recognize kernel mode faults and
fixup them.

Attachment: pgp00000.pgp
Description: PGP signature