Re: [RFC][PATCH] exec: Freeze the other threads during a multi-threaded exec
From: Linus Torvalds
Date: Mon Jul 27 2020 - 20:20:24 EST
On Mon, Jul 27, 2020 at 2:06 PM Eric W. Biederman <ebiederm@xxxxxxxxxxxx> wrote:
>
> Therefore make it simpler to get exec correct by freezing the other
> threads at the beginning of exec. This removes an entire class of
> races, and makes it tractable to fix some of the long standing
> issues with exec.
I hate the global state part of the freezer.
It's also pointless. We don't want to trigger all the tests that
various random driver kernel threads do.
I also really don't like how now execve() by any random person will
suddenly impact everything that might be doing freezing.
It also makes for a possible _huge_ latency regression for execve(),
since freezing really has never been a very low-latency operation.
Other threads doing IO can now basically block execve() for a long
long long time.
Finally, I think your patch is fundamentally broken for another
reason: it depends on CONFIG_FREEZER, and that isn't even required to
be set!
So no, this is not at all acceptable in that form.
Now, maybe we could _make_ it acceptable, by
(a) add a per-process freezer count to avoid the global state for this case
(b) make a small subset of the freezing code available for the
!CONFIG_FREEZER thing
(c) fix this "simple freezer" to not actually force wakeups etc, but
catch things in the
but honestly, at that point nothing of the "CONFIG_FREEZER" code even
really exists any more. It would be more of a "execve_synchronize()"
thing, where we'd catch things in the scheduler and/or system call
entry/exit or whatever.
Also, that makes these kinds of nasty hacks that just make the
existign freezer code even harder to figure out:
> A new function exec_freeze_threads based upon
> kernel/power/process.c:try_to_freeze_tasks is added. To play well
> with other uses of the kernel freezer it uses a killable sleep wrapped
> with freezer_do_not_count/freezer_count.
Ugh. Just _ugly_.
And honestly, completely and utterly broken. See above.
I understand the wish to re-use existing infrastructure. But the fact
is, the FREEZER code is just about the _last_ thing you should want to
use. That, and stop_machine(), is just too much of a big hammer.
Linus