Re: [PATCH v3 01/13] x86/fpu/xstate: Avoid getting xstate address of init_fpstate if fpstate contains the component

From: Mingwei Zhang
Date: Fri Feb 24 2023 - 18:56:14 EST


On Wed, Feb 22, 2023, Chang S. Bae wrote:
> On 2/22/2023 10:40 AM, Mingwei Zhang wrote:
> > > > We have this [1]:
> > > >
> > > > if (fpu_state_size_dynamic())
> > > > mask &= (header.xfeatures | xinit->header.xcomp_bv);
> > > >
> > > > If header.xfeatures[18] = 0 then mask[18] = 0 because
> > > > xinit->header.xcomp_bv[18] = 0. Then, it won't hit that code. So, I'm
> > > > confused about the problem that you described here.
> > >
> > > Read the suggested changelog I wrote in my reply to Mingwei.
> > >
> > > TLDR:
> > >
> > > xsave.header.xfeatures[18] = 1
> > > xinit.header.xfeatures[18] = 0
> > > -> mask[18] = 1
> > > -> __raw_xsave_addr(xsave, 18) <- Success
> > > -> __raw_xsave_addr(xinit, 18) <- WARN
>
> Oh, sigh.. This should be caught last time.
>
> Hmm, then since we store init state for legacy ones [1], unless it is too
> aggressive, perhaps the loop can be simplified like this:
>
> diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
> index 714166cc25f2..2dac6f5f3ade 100644
> --- a/arch/x86/kernel/fpu/xstate.c
> +++ b/arch/x86/kernel/fpu/xstate.c
> @@ -1118,21 +1118,13 @@ void __copy_xstate_to_uabi_buf(struct membuf to,
> struct fpstate *fpstate,
> zerofrom = offsetof(struct xregs_state, extended_state_area);
>
> /*
> - * The ptrace buffer is in non-compacted XSAVE format. In
> - * non-compacted format disabled features still occupy state space,
> - * but there is no state to copy from in the compacted
> - * init_fpstate. The gap tracking will zero these states.
> + * Indicate which states to copy from fpstate. When not present in
> + * fpstate, those extended states are either initialized or
> + * disabled. They are also known to have an all zeros init state.
> + * Thus, remove them from 'mask' to zero those features in the user
> + * buffer instead of retrieving them from init_fpstate.
> */
> - mask = fpstate->user_xfeatures;

Do we need to change this line and the comments? I don't see any of
these was relevant to this issue. The original code semantic is to
traverse all user_xfeatures, if it is available in fpstate, copy it from
there; otherwise, copy it from init_fpstate. We do not assume the
component in init_fpstate (but not in fpstate) are all zeros, do we? If
it is safe to assume that, then it might be ok. But at least in this
patch, I want to keep the original semantics as is without the
assumption.
> -
> - /*
> - * Dynamic features are not present in init_fpstate. When they are
> - * in an all zeros init state, remove those from 'mask' to zero
> - * those features in the user buffer instead of retrieving them
> - * from init_fpstate.
> - */
> - if (fpu_state_size_dynamic())
> - mask &= (header.xfeatures | xinit->header.xcomp_bv);
> + mask = header.xfeatures;

Same here. Let's not adding this optimization in this patch.

>
> for_each_extended_xfeature(i, mask) {
> /*
> @@ -1151,9 +1143,8 @@ void __copy_xstate_to_uabi_buf(struct membuf to,
> struct fpstate *fpstate,
> pkru.pkru = pkru_val;
> membuf_write(&to, &pkru, sizeof(pkru));
> } else {
> - copy_feature(header.xfeatures & BIT_ULL(i), &to,
> + membuf_write(&to,
> __raw_xsave_addr(xsave, i),
> - __raw_xsave_addr(xinit, i),
> xstate_sizes[i]);
> }
> /*
>
> > Chang: to reproduce this issue, you can simply run the amx_test in the
> > kvm selftest directory.
>
> Yeah, I was able to reproduce it with this ptrace test:
>
> diff --git a/tools/testing/selftests/x86/amx.c
> b/tools/testing/selftests/x86/amx.c
> index 625e42901237..ae02bc81846d 100644
> --- a/tools/testing/selftests/x86/amx.c
> +++ b/tools/testing/selftests/x86/amx.c
> @@ -14,8 +14,10 @@
> #include <sys/auxv.h>
> #include <sys/mman.h>
> #include <sys/shm.h>
> +#include <sys/ptrace.h>
> #include <sys/syscall.h>
> #include <sys/wait.h>
> +#include <sys/uio.h>
>
> #include "../kselftest.h" /* For __cpuid_count() */
>
> @@ -826,6 +828,76 @@ static void test_context_switch(void)
> free(finfo);
> }
>
> +/* Ptrace test */
> +
> +static bool inject_tiledata(pid_t target)
> +{
> + struct xsave_buffer *xbuf;
> + struct iovec iov;
> +
> + xbuf = alloc_xbuf();
> + if (!xbuf)
> + fatal_error("unable to allocate XSAVE buffer");
> +
> + load_rand_tiledata(xbuf);
> +
> + memcpy(&stashed_xsave->bytes[xtiledata.xbuf_offset],
> + &xbuf->bytes[xtiledata.xbuf_offset],
> + xtiledata.size);
> +
> + iov.iov_base = xbuf;
> + iov.iov_len = xbuf_size;
> +
> + if (ptrace(PTRACE_SETREGSET, target, (uint32_t)NT_X86_XSTATE, &iov))
> + fatal_error("PTRACE_SETREGSET");
> +
> + if (ptrace(PTRACE_GETREGSET, target, (uint32_t)NT_X86_XSTATE, &iov))
> + err(1, "PTRACE_GETREGSET");
> +
> + if (!memcmp(&stashed_xsave->bytes[xtiledata.xbuf_offset],
> + &xbuf->bytes[xtiledata.xbuf_offset],
> + xtiledata.size))
> + return true;
> + else
> + return false;
> +}
> +
> +static void test_ptrace(void)
> +{
> + pid_t child;
> + int status;
> +
> + child = fork();
> + if (child < 0) {
> + err(1, "fork");
> + } else if (!child) {
> + if (ptrace(PTRACE_TRACEME, 0, NULL, NULL))
> + err(1, "PTRACE_TRACEME");
> +
> + /* Use the state to expand the kernel buffer */
> + load_rand_tiledata(stashed_xsave);
> +
> + raise(SIGTRAP);
> + _exit(0);
> + }
> +
> + do {
> + wait(&status);
> + } while (WSTOPSIG(status) != SIGTRAP);
> +
> + printf("\tInject tile data via ptrace()\n");
> +
> + if (inject_tiledata(child))
> + printf("[OK]\tTile data was written on ptracee.\n");
> + else
> + printf("[FAIL]\tTile data was not written on ptracee.\n");
> +
> + ptrace(PTRACE_DETACH, child, NULL, NULL);
> + wait(&status);
> + if (!WIFEXITED(status) || WEXITSTATUS(status))
> + err(1, "ptrace test");
> +}
> +
> int main(void)
> {
> /* Check hardware availability at first */
> @@ -846,6 +918,8 @@ int main(void)
> ctxtswtest_config.num_threads = 5;
> test_context_switch();
>
> + test_ptrace();
> +
> clearhandler(SIGILL);
> free_stashed_xsave();
>
> Thanks,
> Chang
>
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/kernel/fpu/xstate.c#n386
>

Nice one. Yeah both ptrace and KVM are calling this function so the above
code would also be enough to trigger the bug.


Thanks.
-Mingwei