Re: [RFC] EPOLL_KILLME: New flag to epoll_wait() that subscribes process to death row (new syscall)

From: Colin Walters
Date: Wed Nov 01 2017 - 15:37:59 EST


On Wed, Nov 1, 2017, at 03:02 PM, Shawn Landden wrote:
>
> This solves the fact that epoll_pwait() already is a 6 argument (maximum allowed) syscall. But what if the process has multiple epoll() instances in multiple threads?

Well, that's a subset of the general question of - what is the interaction
of this system call and threading? It looks like you've prototyped this
out in userspace with systemd, but from a quick glance at the current git,
systemd's threading is limited doing sync()/fsync() and gethostbyname() async.

But languages with a GC tend to at least use a background thread for that,
and of course lots of modern userspace makes heavy use of multithreading
(or variants like goroutines).

A common pattern though is to have a "main thread" that acts as a control
point and runs the mainloop (particularly for anything with a GUI). That's
going to be the thing calling prctl(SET_IDLE) - but I think its idle state should implicitly
affect the whole process, since for a lot of apps those other threads are going to
just be "background".

It'd probably then be an error to use prctl(SET_IDLE) in more than one thread
ever? (Although that might break in golang due to the way goroutines can
be migrated across threads)

That'd probably be a good "generality test" - what would it take to have
this system call be used for a simple golang webserver app that's e.g.
socket activated by systemd, or a Kubernetes service? Or another
really interesting case would be qemu; make it easy to flag VMs as always
having this state (most of my testing VMs are like this; it's OK if they get
destroyed, I just reinitialize them from the gold state).

Going back to threading - a tricky thing we should handle in general
is when userspace libraries create threads that are unknown to the app;
the "async gethostbyname()" is a good example. To be conservative we'd
likely need to "fail non-idle", but figure out some way tell the kernel
for e.g. GC threads that they're still idle.