Re: [PATCH] signal: Allow tasks to cache one sigqueue struct (again).

From: Linus Torvalds
Date: Tue Nov 29 2022 - 15:06:54 EST


On Tue, Nov 29, 2022 at 10:22 AM Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> That very much means "don't try to revive a patch that was already
> reverted because it was broken and didn't make sense". Throw this
> patch AWAY. It's not worth reviving. Start from scratch, with that
> "this needs to be _obviously_ correct" as the starting point.

Ok, so I realize that I said "obviously correct", and then I'm sending
out this ENTIRELY UNTESTED PATCH as an attachment. And I even made it
not build, on purpose.

In other words, the attached patch is anything *but* obviously
correct, but I'm sending it out as a "start from scratch" suggestion
for "maybe something like this".

Maybe this doesn't catch enough cases to really be worth it, but as a
starting point it

(a) obviously done under the sighand->siglock or when the task is
dead and cannot be dereferenced

(b) only adds sigqueues that have SIGQUEUE_PREALLOC set, and with an
empty queue

where (a) makes me at least go "the locking is fairly simple" and (b)
makes me go "and we don't have any rlimit counting issues" and it
really just acts as a single front-end cache.

As a result of (b), the freeing part should be just that kmem_cache_free().

And yeah, that doesn't actually build as-is, because 'sigqueue_cachep'
is local to kernel/signal.c, but rather than fix that I left this just
broken because I get the feeling that __exit_signal() should just be
*moved* into kernel/signal.c, but I didn't think about it enough.

In other words, this is *really* meant as not a good patch, but as a
"can we please go in this direction instead".

And yes, maybe some other "free this siginfo" paths should *also* do
that "try to add it to the cache, and if so, you're done" after doing
the accounting. So this is probably all mis-guided, but I ended up
wanting to just see how an alternative approach would look, and this
feels a lot safer to me.

We already have that SIGQUEUE_PREALLOC case, why not use some of the
same logic for the cached entry. Sure, it bypasses the rlimit, but
what else is new?

Maybe it all needs to do accounting, but at least in this form it
feels "safe", in that you have to free one of those unaccounted
entries in order to get another unaccounted entry.

Linus
include/linux/sched.h | 1 +
kernel/exit.c | 4 ++++
kernel/fork.c | 2 ++
kernel/signal.c | 20 ++++++++++++++++++++
4 files changed, 27 insertions(+)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index ffb6eb55cd13..ac44ed5bb0ab 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1103,6 +1103,7 @@ struct task_struct {
/* Signal handlers: */
struct signal_struct *signal;
struct sighand_struct __rcu *sighand;
+ struct sigqueue *sigqueue_cache;
sigset_t blocked;
sigset_t real_blocked;
/* Restored if set_restore_sigmask() was used: */
diff --git a/kernel/exit.c b/kernel/exit.c
index 35e0a31a0315..8d287c8481b4 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -157,6 +157,10 @@ static void __exit_signal(struct task_struct *tsk)
*/
flush_sigqueue(&tsk->pending);
tsk->sighand = NULL;
+ if (tsk->sigqueue_cache) {
+ kmem_cache_free(sigqueue_cachep, tsk->sigqueue_cache);
+ tsk->sigqueue_cache = NULL;
+ }
spin_unlock(&sighand->siglock);

__cleanup_sighand(sighand);
diff --git a/kernel/fork.c b/kernel/fork.c
index 08969f5aa38d..96ffbce88aa3 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -535,6 +535,8 @@ void put_task_stack(struct task_struct *tsk)

void free_task(struct task_struct *tsk)
{
+ if (tsk->sigqueue_cache)
+ kmem_cache_free(sigqueue_cachep, tsk->sigqueue_cache);
release_user_cpus_ptr(tsk);
scs_release(tsk);

diff --git a/kernel/signal.c b/kernel/signal.c
index d140672185a4..ccc0c7aa84e6 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1107,6 +1107,12 @@ static int __send_signal_locked(int sig, struct kernel_siginfo *info,
if ((sig == SIGKILL) || (t->flags & PF_KTHREAD))
goto out_set;

+ if (t->sigqueue_cache) {
+ q = t->sigqueue_cache;
+ t->sigqueue_cache = NULL;
+ goto add_sigqueue;
+ }
+
/*
* Real-time signals must be queued if sent by sigqueue, or
* some other real-time mechanism. It is implementation
@@ -1124,6 +1130,7 @@ static int __send_signal_locked(int sig, struct kernel_siginfo *info,
q = __sigqueue_alloc(sig, t, GFP_ATOMIC, override_rlimit, 0);

if (q) {
+add_sigqueue:
list_add_tail(&q->list, &pending->list);
switch ((unsigned long) info) {
case (unsigned long) SEND_SIG_NOINFO:
@@ -1933,6 +1940,13 @@ struct sigqueue *sigqueue_alloc(void)
return __sigqueue_alloc(-1, current, GFP_KERNEL, 0, SIGQUEUE_PREALLOC);
}

+/*
+ * We only add sigqueues with SIGQUEUE_PREALLOC set
+ * and list_empty(&q->list) to the sigqueue_cache.
+ *
+ * That automatically means that we don't count them
+ * towards any limits.
+ */
void sigqueue_free(struct sigqueue *q)
{
unsigned long flags;
@@ -1945,6 +1959,12 @@ void sigqueue_free(struct sigqueue *q)
* __exit_signal()->flush_sigqueue().
*/
spin_lock_irqsave(lock, flags);
+ if (!current->sigqueue_cache && list_empty(&q->list)) {
+ current->sigqueue_cache = q;
+ spin_unlock_irqrestore(lock, flags);
+ return;
+ }
+
q->flags &= ~SIGQUEUE_PREALLOC;
/*
* If it is queued it will be freed when dequeued,