Re: [PATCH v3 5/6] x86/alternative: Use a single access in text_poke() where possible

From: Josh Poimboeuf
Date: Thu Jan 10 2019 - 12:20:15 EST

Next message: Arnd Bergmann: "[PATCH 02/11] time: Add struct __kernel_timex"
Previous message: Boris Brezillon: "Re: [GIT PULL] mtd: Fixes for 5.0-rc2"
In reply to: Nadav Amit: "Re: [PATCH v3 5/6] x86/alternative: Use a single access in text_poke() where possible"
Next in thread: Nadav Amit: "Re: [PATCH v3 5/6] x86/alternative: Use a single access in text_poke() where possible"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Thu, Jan 10, 2019 at 09:32:23AM +0000, Nadav Amit wrote:
> > @@ -714,14 +714,39 @@ void *text_poke(void *addr, const void *opcode, size_t len)
> > }
> > BUG_ON(!pages[0]);
> > local_irq_save(flags);
> > +
> > set_fixmap(FIX_TEXT_POKE0, page_to_phys(pages[0]));
> > if (pages[1])
> > set_fixmap(FIX_TEXT_POKE1, page_to_phys(pages[1]));
> > - vaddr = (char *)fix_to_virt(FIX_TEXT_POKE0);
> > - memcpy(&vaddr[(unsigned long)addr & ~PAGE_MASK], opcode, len);
> > +
> > + vaddr = fix_to_virt(FIX_TEXT_POKE0) + ((unsigned long)addr & ~PAGE_MASK);
> > +
> > + /*
> > + * Use a single access where possible. Note that a single unaligned
> > + * multi-byte write will not necessarily be atomic on x86-32, or if the
> > + * address crosses a cache line boundary.
> > + */
> > + switch (len) {
> > + case 1:
> > + WRITE_ONCE(*(u8 *)vaddr, *(u8 *)opcode);
> > + break;
> > + case 2:
> > + WRITE_ONCE(*(u16 *)vaddr, *(u16 *)opcode);
> > + break;
> > + case 4:
> > + WRITE_ONCE(*(u32 *)vaddr, *(u32 *)opcode);
> > + break;
> > + case 8:
> > + WRITE_ONCE(*(u64 *)vaddr, *(u64 *)opcode);
> > + break;
> > + default:
> > + memcpy((void *)vaddr, opcode, len);
> > + }
> > +
>
> Even if Intel and AMD CPUs are guaranteed to run instructions from L1
> atomically, this may break instruction emulators, such as those that
> hypervisors use. They might not read instructions atomically if on SMP VMs
> when the VM's text_poke() races with the emulated instruction fetch.
>
> While I can't find a reason for hypervisors to emulate this instruction,
> smarter people might find ways to turn it into a security exploit.

Interesting point... but I wonder if it's a realistic concern. BTW,
text_poke_bp() also relies on undocumented behavior.

The entire instruction doesn't need to be read atomically; just the
32-bit call destination. Assuming the hypervisor is x86-64, and it uses
a 32-bit access to read the call destination (which seems logical), the
intra-cacheline reads will be atomic, as stated in the SDM.

If the above assumptions are not true, and the hypervisor reads the call
destination non-atomically (which seems unlikely IMO), even then I don't
see how it could be realistically exploitable. It would just oops from
calling a corrupt address.

--
Josh

Next message: Arnd Bergmann: "[PATCH 02/11] time: Add struct __kernel_timex"
Previous message: Boris Brezillon: "Re: [GIT PULL] mtd: Fixes for 5.0-rc2"
In reply to: Nadav Amit: "Re: [PATCH v3 5/6] x86/alternative: Use a single access in text_poke() where possible"
Next in thread: Nadav Amit: "Re: [PATCH v3 5/6] x86/alternative: Use a single access in text_poke() where possible"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]