Re: [PATCH v2 1/2] x86/fpu: Fix NULL dereference in avx512_status()

From: Dave Hansen
Date: Mon Jul 21 2025 - 18:33:42 EST


On 7/21/25 14:53, Sohil Mehta wrote:
> From: Fushuai Wang <wangfushuai@xxxxxxxxx>
>
> When CONFIG_X86_DEBUG_FPU is set, reading /proc/[kthread]/arch_status
> causes a NULL pointer dereference.
>
> For Kthreads tasks:
> proc_pid_arch_status()
> avx512_status()
> x86_task_fpu() => returns NULL when CONFIG_X86_DEBUG_FPU=y
> x86_task_fpu()->avx512_timestamp => NULL dereference

This seems to imply that CONFIG_X86_DEBUG_FPU _triggers_ the bug.

It certainly makes it obvious, but isn't there a bug with or without
CONFIG_X86_DEBUG_FPU? Even without it, I think it'll access out of the
init_task bounds.

> Kernel threads aren't expected to access FPU state directly. However,
> avx512_timestamp resides within struct fpu which lead to this unique
> situation.

What does this mean? Most kernel threads have a 'struct fpu', right?

> It is uncertain whether kernel threads use AVX-512 in a meaningful way
> that needs userspace reporting. For now, avoid reporting AVX-512 usage
> for kernel threads.

It would be idea if this was more explicit about how this changes the
ABI for kernel threads.

Could we make this more precise, please?

Report "AVX512_elapsed_ms: -1" for kernel tasks, whether they
use AVX-512 or not. This is the same value that is reported for
user tasks which have not been detected using AVX-512.

> diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
> index 9aa9ac8399ae..a75077c645b6 100644
> --- a/arch/x86/kernel/fpu/xstate.c
> +++ b/arch/x86/kernel/fpu/xstate.c
> @@ -1855,19 +1855,18 @@ long fpu_xstate_prctl(int option, unsigned long arg2)
> #ifdef CONFIG_PROC_PID_ARCH_STATUS
> /*
> * Report the amount of time elapsed in millisecond since last AVX512
> - * use in the task.
> + * use in the task. Report -1 if no AVX-512 usage.
> */
> static void avx512_status(struct seq_file *m, struct task_struct *task)
> {
> - unsigned long timestamp = READ_ONCE(x86_task_fpu(task)->avx512_timestamp);
> - long delta;
> + unsigned long timestamp = 0;
> + long delta = -1;
>
> - if (!timestamp) {
> - /*
> - * Report -1 if no AVX512 usage
> - */
> - delta = -1;
> - } else {
> + /* Do not report AVX-512 usage for kernel threads */
> + if (!(task->flags & (PF_KTHREAD | PF_USER_WORKER)))
> + timestamp = READ_ONCE(x86_task_fpu(task)->avx512_timestamp);
> +
> + if (timestamp) {
> delta = (long)(jiffies - timestamp);
> /*
> * Cap to LONG_MAX if time difference > LONG_MAX

Can we just do:

unsigned long timestamp;
long delta;

if (task->flags & (PF_KTHREAD | PF_USER_WORKER))
return;

timestamp = READ_ONCE(x86_task_fpu(task)->avx512_timestamp);

...

for now, please? That way, there's no made up value for kernel threads.
The value just isn't present in the file.