Re: [PATCH] x86/fpu: Fix potential NULL dereference in avx512_status()

From: Sohil Mehta
Date: Fri Jul 18 2025 - 02:21:53 EST


On 7/17/2025 12:21 PM, Sohil Mehta wrote:
> On 7/17/2025 2:43 AM, Fushuai Wang wrote:
>> The avx512_status() function would then dereference this
>> NULL pointer via READ_ONCE(x86_task_fpu(task)->avx512_timestamp).
>> when reading /proc/*/arch_status, causing a kernel NULL pointer dereference
>> and system will crash.
>>
>
> The kernel seems to assume that a Kthread would never call
> x86_task_fpu(). That assumption is breaking in this scenario, which
> causes the below issue.
>

This concern was discussed while adding the checks:
https://lore.kernel.org/all/ZmFziN0i10sILaIo@xxxxxxxxx/

Adding a few folks who were involved in the discussion that time.

> Can you please share any other warnings that were triggered before this
> Oops message? Also, I'll try to generate this locally. Any specific
> configuration needed for reproducing this apart from CONFIG_X86_DEBUG_FPU?
>

I was able to reproduce this on a system with X86_FEATURE_AVX512F. The
issue only happens while reading arch_status on a kthread.

$cat /proc/[kthread]/arch_status => NULL pointer exception
$cat /proc/[user thread]/arch_status => No issue seen

Can you confirm that you are seeing the same behavior?

Unfortunately, avx512_timestamp resides within struct fpu. So getting
that value for a kthread would mean going through x86_task_fpu().

I am wondering if we ever need to expose the AVX512 usage for kernel
threads? If not, then we can do what you currently have but without the
CONFIG_X86_DEBUG_FPU restriction. All kernel threads would always print
the AVX512_elapsed_ms as -1.

However, this would be a user visible change so we should probably get
more inputs. I tried this experiment on an older kernel without the
above issue. Among all the active kthreads on my system a handful of
them show a valid value for AVX512 usage. The rest of them all show -1.

PID: 2594
CMD: avahi-daemon: running [SAP.local]
/proc/2594/arch_status content:
AVX512_elapsed_ms: 46032

PID: 2729
CMD: sshd: /usr/sbin/sshd -D [listener] 0 of 10-100 startups
/proc/2729/arch_status content:
AVX512_elapsed_ms: 396656

To keep the older behavior, we might need to consider moving
avx512_timestamp out of struct fpu. Though, I am uncertain about its
implication.


>>
>> diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
>> index 9aa9ac8399ae..16f813a42f42 100644
>> --- a/arch/x86/kernel/fpu/xstate.c
>> +++ b/arch/x86/kernel/fpu/xstate.c
>> @@ -1859,9 +1859,14 @@ long fpu_xstate_prctl(int option, unsigned long arg2)
>> */
>> static void avx512_status(struct seq_file *m, struct task_struct *task)
>> {
>> - unsigned long timestamp = READ_ONCE(x86_task_fpu(task)->avx512_timestamp);
>> + unsigned long timestamp = 0;
>> long delta;
>>
>> +#ifdef CONFIG_X86_DEBUG_FPU
>> + if (!(task->flags & PF_KTHREAD))
>> +#endif
>
> The logical code flow should not change based on X86_DEBUG_FPU. The fix
> for this issue likely needs to be somewhere else. Though, I am still
> working on identifying the exact root cause.
>
>> + timestamp = READ_ONCE(x86_task_fpu(task)->avx512_timestamp);
>> +
>> if (!timestamp) {
>> /*
>> * Report -1 if no AVX512 usage
>