Re: [rfc][patch] Avoid taking global tasklist_lock for single threadedprocess at getrusage()

From: Ravikiran G Thirumalai
Date: Wed Mar 22 2006 - 17:15:33 EST


On Mon, Mar 20, 2006 at 09:04:56PM +0300, Oleg Nesterov wrote:
> Hello Ravikiran,

Hi Oleg, sorry for the late response..

>
> Ravikiran G Thirumalai wrote:
> >
> > Following patch avoids taking the global tasklist_lock when possible,
> > if a process is single threaded during getrusage(). Any avoidance of
> > tasklist_lock is good for NUMA boxes (and possibly for large SMPs).
> >
> > ...
> >
> > static void k_getrusage(struct task_struct *p, int who, struct rusage *r)
> > @@ -1681,14 +1697,22 @@ static void k_getrusage(struct task_stru
> > struct task_struct *t;
> > unsigned long flags;
> > cputime_t utime, stime;
> > + int need_lock = 0;
> >
> > memset((char *) r, 0, sizeof *r);
> > -
> > - if (unlikely(!p->signal))
> > - return;
> > -
> > utime = stime = cputime_zero;
> >
> > + need_lock = (p != current || !thread_group_empty(p));
> > + if (need_lock) {
> > + read_lock(&tasklist_lock);
> > + if (unlikely(!p->signal)) {
> > + read_unlock(&tasklist_lock);
> > + return;
> > + }
> > + } else
> > + /* See locking comments above */
> > + smp_rmb();
> > +
>
> I think now it is possible to improve this patch.
>
> Could you look at these patches?
>
> [PATCH] introduce lock_task_sighand() helper
> http://marc.theaimsgroup.com/?l=linux-kernel&m=114028190927763
>
> [PATCH 0/3] make threads traversal ->siglock safe
> http://marc.theaimsgroup.com/?l=linux-kernel&m=114064825626496
>
> I think we can forget about tasklist_lock in k_getrusage() completely
> and just use lock_task_sighand().
>
> What do you think?

Great!! Nice patches to avoid tasklist lock on thread traversal.

However, I was trying to comprehend the tasklist locking changes in
2.6.16-rc6mm2 and hit upon:

__exit_signal
cleanup_sighand(tsk);
kmem_cache_free(sighand)
spin_unlock(sighand->lock)

It looked suspicious to me until I realised sighand cache now had
SLAB_DESTROY_BY_RCU. Can we please add comments (at cleanup_sighand or
__exit_signal) to make it a bit clearer for people like me :)

How is the following patch to avoid tasklist lock completely at getrusage?
(Andrew, I can remake the patch against a reverted
avoid-taking-global-tasklist_lock-for-single-threadedprocess-at-getrusage
if you prefer it that way)

Thanks,
Kiran


Change avoid-taking-global-tasklist_lock-for-single-threadedprocess-at-getrusage
patch to not take the global tasklist lock at all. We don't need to take
the tasklist lock for thread traversal of a process since Oleg's
do-__unhash_process-under-siglock.patch and related work.

Signed-off-by: Ravikiran Thirumalai <kiran@xxxxxxxxxxxx>

Index: linux-2.6.16-rc6mm2/kernel/sys.c
===================================================================
--- linux-2.6.16-rc6mm2.orig/kernel/sys.c 2006-03-21 17:04:09.000000000 -0800
+++ linux-2.6.16-rc6mm2/kernel/sys.c 2006-03-22 12:46:03.000000000 -0800
@@ -1860,23 +1860,20 @@ out:
* fields when reaping, so a sample either gets all the additions of a
* given child after it's reaped, or none so this sample is before reaping.
*
- * tasklist_lock locking optimisation:
- * If we are current and single threaded, we do not need to take the tasklist
- * lock or the siglock. No one else can take our signal_struct away,
- * no one else can reap the children to update signal->c* counters, and
- * no one else can race with the signal-> fields.
- * If we do not take the tasklist_lock, the signal-> fields could be read
- * out of order while another thread was just exiting. So we place a
- * read memory barrier when we avoid the lock. On the writer side,
- * write memory barrier is implied in __exit_signal as __exit_signal releases
- * the siglock spinlock after updating the signal-> fields.
- *
- * We don't really need the siglock when we access the non c* fields
- * of the signal_struct (for RUSAGE_SELF) even in multithreaded
- * case, since we take the tasklist lock for read and the non c* signal->
- * fields are updated only in __exit_signal, which is called with
- * tasklist_lock taken for write, hence these two threads cannot execute
- * concurrently.
+ * Locking:
+ * We need to take the siglock for CHILDEREN, SELF and BOTH
+ * for the cases current multithreaded, non-current single threaded
+ * non-current multithreaded. Thread traversal is now safe with
+ * the siglock held.
+ * Strictly speaking, we donot need to take the siglock if we are current and
+ * single threaded, as no one else can take our signal_struct away, no one
+ * else can reap the children to update signal->c* counters, and no one else
+ * can race with the signal-> fields. If we do not take any lock, the
+ * signal-> fields could be read out of order while another thread was just
+ * exiting. So we should place a read memory barrier when we avoid the lock.
+ * On the writer side, write memory barrier is implied in __exit_signal
+ * as __exit_signal releases the siglock spinlock after updating the signal->
+ * fields. But we don't do this yet to keep things simple.
*
*/

@@ -1885,35 +1882,25 @@ static void k_getrusage(struct task_stru
struct task_struct *t;
unsigned long flags;
cputime_t utime, stime;
- int need_lock = 0;

memset((char *) r, 0, sizeof *r);
utime = stime = cputime_zero;

- if (p != current || !thread_group_empty(p))
- need_lock = 1;
-
- if (need_lock) {
- read_lock(&tasklist_lock);
- if (unlikely(!p->signal)) {
- read_unlock(&tasklist_lock);
- return;
- }
- } else
- /* See locking comments above */
- smp_rmb();
+ rcu_read_lock();
+ if (!lock_task_sighand(p, &flags)) {
+ rcu_read_unlock();
+ return;
+ }

switch (who) {
case RUSAGE_BOTH:
case RUSAGE_CHILDREN:
- spin_lock_irqsave(&p->sighand->siglock, flags);
utime = p->signal->cutime;
stime = p->signal->cstime;
r->ru_nvcsw = p->signal->cnvcsw;
r->ru_nivcsw = p->signal->cnivcsw;
r->ru_minflt = p->signal->cmin_flt;
r->ru_majflt = p->signal->cmaj_flt;
- spin_unlock_irqrestore(&p->sighand->siglock, flags);

if (who == RUSAGE_CHILDREN)
break;
@@ -1940,9 +1927,10 @@ static void k_getrusage(struct task_stru
default:
BUG();
}
-
- if (need_lock)
- read_unlock(&tasklist_lock);
+
+ unlock_task_sighand(p, &flags);
+ rcu_read_unlock();
+
cputime_to_timeval(utime, &r->ru_utime);
cputime_to_timeval(stime, &r->ru_stime);
}

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/