Re: [PATCH v2] x86/fault: Decode and print #PF oops in human readable form

From: Sean Christopherson
Date: Mon Dec 10 2018 - 11:04:30 EST


On Fri, Dec 07, 2018 at 03:57:10PM -0800, Andy Lutomirski wrote:
> On Fri, Dec 7, 2018 at 2:14 PM Linus Torvalds
> <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> >
> > On Fri, Dec 7, 2018 at 2:06 PM Sean Christopherson
> > <sean.j.christopherson@xxxxxxxxx> wrote:
> > >
> > > Looking at it again, my own personal preference would be to swap the order
> > > of the #PF lines.
> >
> > Yeah, probably.
> >
> > Also:
> >
> > > [ 160.246820] BUG: unable to handle kernel paging request at ffffbeef00000000
> > > [ 160.247517] #PF: supervisor-privileged instruction fetch from kernel code
> > > [ 160.248085] #PF: error_code(0x0010) - not-present page
> >
> > With this form, I think the "kernel" in the first line is actually
> > misleading. Yes, it's a #PF for the kernel, but then the "kernel" on
> > the second line talks about what mode we were in when it happened, so
> > we have two different meanings of "kernel" on two adjacent lines.
>
> I'm okay with this variant. I have a slight preference for:
>
> #PF: supervisor-privileged instruction fetch from kernel code
> #PF error_code: 0x0010 [READ]

[INSTR], but I get the gist :)

> Which is what we'd get from Sean's patch plus my patch here:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/mm&id=ccfb1941f90153818c07fb1a7dc22121a970d252
>
> Sean, what do you think?

Munging the two concepts is my least favorite approach. Printing the
individual bits becomes redundant (with the first line) in many cases,
and superfluous in other cases, e.g. [PROT] is effectively implied by
[RSVD], [PK] and [SGX].

In the example above, printing "[INSTR]" doesn't provide any new info
since the line above already states it was an instruction fetch, and
it never provides a human-readable message describing *why* the fault
occurred.

It'd be more palatable if we printed the negative case for PROT, e.g.
"[!PROT]", but that re-opens the discussion on which bits should be
printed in the negative case. Like Ingo said, it's rather arbitrary
that USER=1 instead of SUPERVISOR=1.

> > So maybe that "BUG: unable to handle kernel paging request" message
> > should be something like
> >
> > "BUG: unable to handle page fault for address ffffbeef00000000"
> >
> > instead? Does that make sense to people?
>
> Yes please.