Re: PANIC: double fault, error_code: 0x0 in 4.0.0-rc3-2, kvm related?

From: Andy Lutomirski
Date: Wed Mar 18 2015 - 17:17:36 EST


On Wed, Mar 18, 2015 at 2:06 PM, Denys Vlasenko <dvlasenk@xxxxxxxxxx> wrote:
> On 03/18/2015 09:49 PM, Andy Lutomirski wrote:
>> On Wed, Mar 18, 2015 at 1:06 PM, Denys Vlasenko <dvlasenk@xxxxxxxxxx> wrote:
>>> On 03/18/2015 08:26 PM, Andy Lutomirski wrote:
>>>> Hi Linus-
>>>>
>>>> You seem to enjoy debugging these things. Want to give this a shot?
>>>> My guess is a vmalloc fault accessing either old_rsp or kernel_stack
>>>> right after swapgs in syscall entry.
>>>
>>> The code is:
>>>
>>> ENTRY(system_call)
>>> SWAPGS_UNSAFE_STACK
>>> GLOBAL(system_call_after_swapgs)
>>> movq %rsp,PER_CPU_VAR(rsp_scratch)
>>> movq PER_CPU_VAR(kernel_stack),%rsp
>>>
>>> If PER_CPU_VAR(var) memory access can page fault
>>> (I was thinking this is ensured to never fault),
>>> then on these two instructions such page fault
>>> will be fatal: we will still have userspace %rsp.
>>>
>>> I thought we can only get a NMI or debug interrupt here,
>>> and they are both set up to use IST stacks
>>> to prevent this scenario (among other reasons).
>>
>> I don't think that #DB is possible -- we should never have a
>> watchpoint on percpu memory like that (unless we're using kgdb, in
>> which case I think that kgdb should be fixed).
>
> And #DB shouldn't cause a problem even if it happens (it's on
> an IST stack).
>
> I was thinking about it more and the thing is, CPU did manage
> to enter page fault handler.
>
> It means that it managed to store iret frame.
>
> This means that stores to (%rsp) worked, whatever %rsp is
> (even if it points to user's page).
>
> The double fault happened only when CALL insn inside the handler
> attempted to push yet another word. _This_ is what did not work.
>
> Why?
>
> I almost ready to declare that it's SMAP triggering:
> that attempts to access (write to) userspace were caught.
> However, disassembly shows
>
> crash> disassemble page_fault
> Dump of assembler code for function page_fault:
> 0xffffffff816834a0 <+0>: data32 xchg %ax,%ax
> 0xffffffff816834a3 <+3>: data32 xchg %ax,%ax
> 0xffffffff816834a6 <+6>: data32 xchg %ax,%ax
> 0xffffffff816834a9 <+9>: sub $0x78,%rsp
> 0xffffffff816834ad <+13>: callq 0xffffffff81683620 <error_entry>
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^KABOOM HERE^^^^^^^^^^^^^^^^^^^^^^^
> 0xffffffff816834b2 <+18>: mov %rsp,%rdi
> 0xffffffff816834b5 <+21>: mov 0x78(%rsp),%rsi
> 0xffffffff816834ba <+26>: movq $0xffffffffffffffff,0x78(%rsp)
> 0xffffffff816834c3 <+35>: callq 0xffffffff810504e0 <do_page_fault>
> 0xffffffff816834c8 <+40>: jmpq 0xffffffff816836d0 <error_exit>
> End of assembler dump.
>
> Those NOPs at the beginning are ASM_CLAC and PARAVIRT_ADJUST_EXCEPTION_FRAME
> from this source:
>
>
> .macro idtentry sym do_sym has_error_code:req paranoid=0 shift_ist=-1
> ENTRY(\sym)
> /* Sanity check */
> .if \shift_ist != -1 && \paranoid == 0
> .error "using shift_ist requires paranoid=1"
> .endif
>
> .if \has_error_code
> XCPT_FRAME
> .else
> INTR_FRAME
> .endif
>
> ASM_CLAC
> PARAVIRT_ADJUST_EXCEPTION_FRAME
>
> subq $ORIG_RAX-R15, %rsp
> call error_entry
> ...
>
> If ASM_CLAC is replaced by NOPs, this CPU must be not SMAP capable.
> If so, then another store to (%rsp) should have worked too...
>
>
> Stefan, Takashi - are you seeing this on SMAP-capable CPUs?

That's why I asked if this was Broadwell. It's not :(

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/