Re: [PATCH 05/11] mm: Introduce arch_pgd_init_late()

From: Andy Lutomirski
Date: Tue Sep 22 2015 - 14:52:56 EST


On Tue, Sep 22, 2015 at 11:44 AM, Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> On Tue, Sep 22, 2015 at 11:37 AM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
>> kinds of mess.
>>
>> I don't think that anyone really wants to move #PF to IST, which means
>> that we simply cannot handle vmalloc faults that happen when switching
>> stacks after SYSCALL, no matter what fanciness we shove into the
>> page_fault asm.
>
> But that's fine. The kernel stack is special. So yes, we want to make
> sure that the kernel stack is always mapped in the thread whose stack
> it is.
>
> But that's not a big and onerous guarantee to make. Not when the
> *real* problem is "random vmalloc allocations made by other processes
> that we are not in the least interested in, and we don't want to add
> synchronization for".
>

It's the kernel stack, the TSS (for sp0) and rsp_scratch at least.
But yes, that's not that onerous, and it's never lazily initialized
elsewhere.

How about this (long-term, not right now): Never free pgd entries.
For each pgd, track the number of populated kernel entries. Also
track the global (init_mm) number of existing kernel entries. At
context switch time, if new_pgd has fewer entries that the total, sync
it.

This hits *at most* 256 times per thread, and otherwise it's just a
single unlikely branch. It guarantees that we only ever take a
vmalloc fault when accessing maps that didn't exist when we last
context switched, which gets us all of the important percpu stuff and
the kernel stack, even if we schedule onto a cpu that didn't exist
when the mm was created.

--Andy

> Linus



--
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/