Re: [RFC][PATCH] exec: Conceal the other threads from wakeups during exec

From: Eric W. Biederman
Date: Fri Jul 31 2020 - 16:10:23 EST


Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> writes:

> On Fri, Jul 31, 2020 at 10:19 AM Eric W. Biederman
> <ebiederm@xxxxxxxxxxxx> wrote:
>>
>> Even limited to opt-in locations I think the trick of being able to
>> transform the wait-state may solve that composition problem.
>
> So the part I found intriguing was the "catch things in the signal
> handling path".
>
> Catching things there - and *only* there - would avoid a lot of the
> problems we had with the freezer. When you're about to return to user
> mode, there are no lock inversions etc.
>
> And it kind of makes conceptual sense to do, since what you're trying
> to capture is the signal group - so using the signal state to do so
> seems like a natural thing to do. No touching of any runqueues or
> scheduler data structures, do everything _purely_ with the signal
> handling pathways.
>
> So that "feels" ok to me.
>
> That said, I do wonder if there are nasty nasty latency issues with
> odd users. Normally, you'd expect that execve() with other threads in
> the group shouldn't be a performance issue, because people simply
> shouldn't do that. So it might be ok.
>
> And if you capture them all in the signal handling pathway, that ends
> up being a very convenient place to zap them all too, so maybe my
> latency worry is misguided.
>
> IOW, I think that you could try to do your "freese other threads" not
> at all like the freezer, but more like a "collect all threads in their
> signal handler parts as the first phase of zapping them".
>
> So maybe this approach is salvageable. I see where something like the
> above could work well. But I say that with a lot of handwaving, and
> maybe if I see the patch I'd go "Christ, I was a complete idiot for
> ever even suggesting that".

Yes.

The tricky bit is that there are a handful of stops that must
be handled, or it is impossible to stop everything without causing
disruption if the exec fails. The big ones are TASK_STOPPED and
TASK_TRACED. There is another in wait_for_vfork_done.

At which point I am looking at writting a wrapper around schedule that
changes task state to something like TASK_WAKEKILL when asked, and then
restores the state when released. Something that is independent of
which freezer the code is using.

It could be the scheduler to with a special bit in state that says
opt-in. But if we have to opt in it is probably much less error
prone to write the code as an wrapper around schedule, and only
modify the core scheduling code if necessary.

If I can make TASK_STOPPED and TASK_TRACED handle spurious wake-ups
I think I can build something that is independent of the rest of the
freezers so the code doesn't have to go 3 deep on wrappers of different
freezer at those locations. It is already 2 layers deep.

But I really don't intend to work on that again for a while.


Right now I am in the final stages of ressurecting:
https://lore.kernel.org/linux-fsdevel/87a7ohs5ow.fsf@xxxxxxxxxxxx/

The hard part looks like cleaning up and resurrecting Oleg's patch
to prevent the abuse of files_struct->count.
https://lore.kernel.org/linux-fsdevel/20180915160423.GA31461@xxxxxxxxxx/

I am close but dotting all of the i's and crossing all of the t's is
taking ab bit.

Eric