Re: [PATCH] x86/mm/fault: Streamline the fault error_code decoder some more

From: Andy Lutomirski
Date: Thu Dec 06 2018 - 14:07:06 EST


On Thu, Dec 6, 2018 at 10:15 AM Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> On Wed, Dec 5, 2018 at 11:34 PM Ingo Molnar <mingo@xxxxxxxxxx> wrote:
> >
> > Yeah, so I don't like the overly long 'SUPERVISOR' and the somewhat
> > inconsistent, sporadic handling of negatives. Here's our error code bits:
> >
> > /*
> > * Page fault error code bits:
> > *
> > * bit 0 == 0: no page found 1: protection fault
> > * bit 1 == 0: read access 1: write access
> > * bit 2 == 0: kernel-mode access 1: user-mode access
>
> No. Really not at all.
>
> Bit 2 is *not* "kernel vs user". Never has been. Never will be.
>
> It's a single bit that mixes up *three* different cases:
>
> - regular user mode access (value: 1)
>
> - regular CPL0 access (value: 0)
>
> - CPU system access (value: 0)
>
> and that third case really is important and relevant. And importantly,
> it can happen from user space.
>
> In fact, these days we possibly have a fourth case:
>
> - kernel access using wruss (value: 1)

Indeed.

>
> and I'd rather see just the numbers (which you have to know to decode)
> than see the simplified AND WRONG decoding of those numbers.

That's why the very next line in the OOPS explains this.

>
> Please don't ever confuse the fault U/S bit with "user vs kernel".
> It's just not true, and people should be very very aware of it now
> being true.
>
> If you care whether a page fault happened in user mode or not, you
> have to look at the register state (ie "user_mode(regs)").

The code we're arguing over was part of a patch set to fix this
confusion in the page fault code for real.

>
> Please call the U/S bit something else than "user" or "kernel".

Dunno. I kind of like following the SDM.

How do you like the attached patch?
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 2ff25ad33233..f1f9e19646f5 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -610,8 +610,7 @@ static void show_ldttss(const struct desc_ptr *gdt, const char *name, u16 index)
static void err_str_append(unsigned long error_code, char *buf, unsigned long mask, const char *txt)
{
if (error_code & mask) {
- if (buf[0])
- strcat(buf, " ");
+ strcat(buf, " ");
strcat(buf, txt);
}
}
@@ -651,23 +650,30 @@ show_fault_oops(struct pt_regs *regs, unsigned long error_code, unsigned long ad
err_txt[0] = 0;

/*
- * Note: length of these appended strings including the separation space and the
- * zero delimiter must fit into err_txt[].
+ * Note: length of these appended strings including the separation
+ * space and the zero delimiter must fit into err_txt[].
*/
err_str_append(error_code, err_txt, X86_PF_PROT, "[PROT]" );
err_str_append(error_code, err_txt, X86_PF_WRITE, "[WRITE]");
+ if ((error_code & (X86_PF_WRITE | X86_PF_INSTR)) == 0)
+ strcat(err_txt, " [read]");
err_str_append(error_code, err_txt, X86_PF_USER, "[USER]" );
err_str_append(error_code, err_txt, X86_PF_RSVD, "[RSVD]" );
err_str_append(error_code, err_txt, X86_PF_INSTR, "[INSTR]");
err_str_append(error_code, err_txt, X86_PF_PK, "[PK]" );

- pr_alert("#PF error: %s\n", error_code ? err_txt : "[normal kernel read fault]");
+ pr_alert("#PF error: 0x%04lx%s\n", error_code, err_txt);

+ /*
+ * The X86_PF_USER bit does *not* mean the same thing as
+ * user_mode(regs). Make sure that the unusual cases are obvious to
+ * the reader.
+ */
if (!(error_code & X86_PF_USER) && user_mode(regs)) {
struct desc_ptr idt, gdt;
u16 ldtr, tr;

- pr_alert("This was a system access from user code\n");
+ pr_alert("NB: This was a supervisor-privileged access from user code.\n");

/*
* This can happen for quite a few reasons. The more obvious
@@ -692,6 +698,14 @@ show_fault_oops(struct pt_regs *regs, unsigned long error_code, unsigned long ad

store_tr(tr);
show_ldttss(&gdt, "TR", tr);
+ } else if ((error_code & X86_PF_USER) && !user_mode(regs)) {
+ /*
+ * This can happen due to WRUSS. If an ISA extension ever
+ * gives new instructions to do user memory access from kernel
+ * mode and we OOPS due to a broken fixup, we'll presumably land
+ * here as well.
+ */
+ pr_alert("NB: This was a user-privileged access from kernel code.\n");
}

dump_pagetable(address);