Re: Simplify load_unaligned_zeropad() (was Re: [GIT PULL] Ceph updates for 5.20-rc1)

From: Peter Zijlstra
Date: Tue Aug 16 2022 - 06:16:08 EST


On Mon, Aug 15, 2022 at 03:49:44PM -0700, Linus Torvalds wrote:
> On Mon, Aug 15, 2022 at 1:09 PM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> >
> > I'm not at all suggesting we do this; but it might be
> > insn_get_addr_ref() does what is needed.
>
> .. you didn't suggest it at all, but I started doing it anyway.

> So since I was tricked into writing this patch, and it's even tested
> (the second attachment has a truly stupid patch with my test-case), I
> think it's worth doing.

Haha, couldn't help yourself eh ;-)

> Comments? I left your "Acked-by" from the previous version of this
> thing, so holler now if you think this got too ugly in the meantime..

That's quite allright, a few nits below, but overall I like it. And yes
it's a bit over the top, but it's important to have fun..

> diff --git a/arch/x86/include/asm/extable_fixup_types.h b/arch/x86/include/asm/extable_fixup_types.h
> index 503622627400..b53f1919710b 100644
> --- a/arch/x86/include/asm/extable_fixup_types.h
> +++ b/arch/x86/include/asm/extable_fixup_types.h
> @@ -64,4 +64,6 @@
> #define EX_TYPE_UCOPY_LEN4 (EX_TYPE_UCOPY_LEN | EX_DATA_IMM(4))
> #define EX_TYPE_UCOPY_LEN8 (EX_TYPE_UCOPY_LEN | EX_DATA_IMM(8))
>
> +#define EX_TYPE_ZEROPAD 20 /* load ax from dx zero-padded */

This comment is now woefully incorrect.

> +
> #endif
> diff --git a/arch/x86/include/asm/word-at-a-time.h b/arch/x86/include/asm/word-at-a-time.h
> index 8338b0432b50..46b4f1f7f354 100644
> --- a/arch/x86/include/asm/word-at-a-time.h
> +++ b/arch/x86/include/asm/word-at-a-time.h
> @@ -77,58 +77,18 @@ static inline unsigned long find_zero(unsigned long mask)
> * and the next page not being mapped, take the exception and
> * return zeroes in the non-existing part.
> */
> static inline unsigned long load_unaligned_zeropad(const void *addr)
> {
> unsigned long ret;
>
> + asm volatile(
> "1: mov %[mem], %[ret]\n"
> "2:\n"
> + _ASM_EXTABLE_TYPE(1b, 2b, EX_TYPE_ZEROPAD)
> + : [ret] "=r" (ret)
> : [mem] "m" (*(unsigned long *)addr));

That looks delightfully simple :-)

>
> return ret;
> }
>
> #endif /* _ASM_WORD_AT_A_TIME_H */
> diff --git a/arch/x86/mm/extable.c b/arch/x86/mm/extable.c
> index 331310c29349..60814e110a54 100644
> --- a/arch/x86/mm/extable.c
> +++ b/arch/x86/mm/extable.c
> @@ -41,6 +41,59 @@ static bool ex_handler_default(const struct exception_table_entry *e,
> return true;
> }
>
> +/*
> + * This is the *very* rare case where we do a "load_unaligned_zeropad()"
> + * and it's a page crosser into a non-existent page.
> + *
> + * This happens when we optimistically load a pathname a word-at-a-time
> + * and the name is less than the full word and the next page is not
> + * mapped. Typically that only happens for CONFIG_DEBUG_PAGEALLOC.
> + *
> + * NOTE! The faulting address is always a 'mov mem,reg' type instruction
> + * of size 'long', and the exception fixup must always point to right
> + * after the instruction.
> + */
> +static bool ex_handler_zeropad(const struct exception_table_entry *e,
> + struct pt_regs *regs,
> + unsigned long fault_addr)
> +{
> + struct insn insn;
> + const unsigned long mask = sizeof(long) - 1;
> + unsigned long offset, addr, next_ip, len;
> + unsigned long *reg;
> +
> + next_ip = ex_fixup_addr(e);
> + len = next_ip - regs->ip;
> + if (len > MAX_INSN_SIZE)
> + return false;
> +
> + if (insn_decode(&insn, (void *) regs->ip, len, INSN_MODE_KERN))
> + return false;

We have insn_decode_kernel() for exactly this (very) common case.

if (insn_decode_kernel(&insn, (void *)regs->ip))
return false;

> + if (insn.length != len)
> + return false;
> +
> + if (insn.opcode.bytes[0] != 0x8b)
> + return false;

I was wondering if we want something like MOV_INSN_OPCODE for 0x8b to
enhance readability, otoh it's currently 0x8b all over the place, so
whatever. At some point you gotta have the insn tables with you anyway.

> + if (insn.opnd_bytes != sizeof(long))
> + return false;
> +
> + addr = (unsigned long) insn_get_addr_ref(&insn, regs);
> + if (addr == ~0ul)
> + return false;
> +
> + offset = addr & mask;
> + addr = addr & ~mask;
> + if (fault_addr != addr + sizeof(long))
> + return false;
> +
> + reg = insn_get_modrm_reg_ptr(&insn, regs);
> + if (!reg)
> + return false;
> +
> + *reg = *(unsigned long *)addr >> (offset * 8);
> + return ex_handler_default(e, regs);
> +}

Yep, that all looks about right.

> +
> static bool ex_handler_fault(const struct exception_table_entry *fixup,
> struct pt_regs *regs, int trapnr)
> {
> @@ -217,6 +270,8 @@ int fixup_exception(struct pt_regs *regs, int trapnr, unsigned long error_code,
> return ex_handler_sgx(e, regs, trapnr);
> case EX_TYPE_UCOPY_LEN:
> return ex_handler_ucopy_len(e, regs, trapnr, reg, imm);
> + case EX_TYPE_ZEROPAD:
> + return ex_handler_zeropad(e, regs, fault_addr);
> }
> BUG();
> }