RE: [RFC 00/14] Dynamic Kernel Stacks

From: David Laight
Date: Mon Mar 18 2024 - 11:39:19 EST


...
> - exit_to_user_mode(): Unmap the extra three pages and return them to
> the per-CPU cache. This function is called late in the kernel exit
> path.

Why bother?
The number of tasks running in user_mode is limited to the number
of cpu. So the most you save is a few pages per cpu.

Plausibly a context switch from an interrupt (eg timer tick)
could suspend a task without saving anything on its kernel stack.
But how common is that in reality?
In a well behaved system most user threads will be sleeping on
some event - so with an active kernel stack.

I can also imagine that something like sys_epoll() actually
sleeps with not (that much) stack allocated.
But the calls into all the drivers to check the status
could easily go into another page.
You really wouldn't to keep allocating and deallocating
physical pages (which I'm sure has TLB flushing costs)
all the time for those processes.

Perhaps a 'garbage collection' activity that reclaims stack
pages from processes that have been asleep 'for a while' or
haven't used a lot of stack recently (if hw 'page accessed'
bit can be used) might make more sense.

Have you done any instrumentation to see which system calls
are actually using more than (say) 8k of stack?
And how often the user threads that make those calls do so?

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)