Re: [PATCHv13 05/16] x86/uaccess: Provide untagged_addr() and remove tags before address check

From: Linus Torvalds
Date: Tue Dec 27 2022 - 14:11:13 EST


On Mon, Dec 26, 2022 at 7:08 PM Kirill A. Shutemov
<kirill.shutemov@xxxxxxxxxxxxxxx> wrote:
>
> --- a/arch/x86/include/asm/uaccess.h
> +++ b/arch/x86/include/asm/uaccess.h
> @@ -21,6 +22,37 @@ static inline bool pagefault_disabled(void);
> # define WARN_ON_IN_IRQ()
> #endif
>
> +#ifdef CONFIG_X86_64

I think this should be CONFIG_ADDRESS_MASKING or something like that.

This is not a "64 vs 32-bit feature". This is something else.

Even if you then were to select it unconditionally for 64-bit kernels
(but why would you?) it reads better if the #ifdef's make sense.

> +#define __untagged_addr(mm, addr) ({ \
> + u64 __addr = (__force u64)(addr); \
> + s64 sign = (s64)__addr >> 63; \
> + __addr &= READ_ONCE((mm)->context.untag_mask) | sign; \

Now the READ_ONCE() doesn't make much sense. There shouldn't be any
data races on that thing.

Plus:

> +#define untagged_addr(addr) __untagged_addr(current->mm, addr)

I think this should at least allow caching it in 'current' without the
mm indirection.

In fact, it might be even better off as a per-cpu variable.

Because it is now in somewhat crititcal code sections:

> -#define get_user(x,ptr) ({ might_fault(); do_get_user_call(get_user,x,ptr); })
> +#define get_user(x,ptr) \
> +({ \
> + might_fault(); \
> + do_get_user_call(get_user,x,untagged_ptr(ptr)); \
> +})

This is disgusting and wrong.

The whole reason we do do_get_user_call() as a function call is
because we *don't* want to do this kind of stuff at the call sites. We
used to inline it all, but with all the clac/stac and access_ok
checks, it all just ended up ballooning so much that it was much
better to make it a special function call with particular calling
conventions.

That untagged_ptr() should be done in that asm function, not in every call site.

Now, the sad part is that we got *rid* of all this kind of crap not
that long ago when Christoph cleaned up the old legacy set_fs() mess,
and we were able to make the task limit be a constant (ok, be _two_
constants, depending on LA57). So we'd have to re-introduce that nasty
"look up task size dynamically". See commit 47058bb54b57 ("x86: remove
address space overrides using set_fs()") for the removal that would
have to be re-instated.

But see above about "maybe it should be a per-cpu variable" - and
making that ALTERNATIVE th8ing even nastier.

Another alternative mght be to *only* test the sign bit in the
get_user/put_user functions, and just take the fault instead. Right
now we warn about non-canonical addresses because it implies somebody
might have missed an access_ok(), but we'd just mark those
get_user/put_user accesses special.

That would get this all entirely off the critical path. Most other
address masking is for relatively rare things (ie mmap/munmap), but
the user accesses are hot.

Hmm?

Linus