floating point computation error caused by eagerfpu

From: Lei Chen
Date: Fri Mar 23 2018 - 03:23:49 EST


Hi,
I'm trying to figure out the root cause of a floating point
calculation error on kernel 4.4.98. My coworker runs a SHA1 test tool.
The generated sha1 does not match the expected value. Strangely, this
test just goes well on one VM. After a lot of comparison between this
VM and the bare metal x86-64 environment, we find the suspicious point
-- the VM uses 'lazy' mode FPU context switch while bare metal server
uses 'eager' mode. Then I rebuilt the kernel with "eagerfpu=DISABLE"
by default. I'm happily to see the test passes across different
platforms(different VMs and different x86 servers).

We don't have any custom FPU setting or modification to the native
Linux 4.4.98 kernel code. Per my understanding, during boot, system
will choose eagerfpu mode automatically according to the CPU's
capability. It should have just worked well if the CPU supports eager
mode. But the test result shows that there might be FPU context
corruption. Having googled around, I don't find similar report. Could
FPU experts shed some light on this issue?

Thanks,
Lei Chen