Re: [PATCH -rt] Fix TASKLET_STATE_SCHED WARN_ON()

From: john stultz
Date: Thu Jun 14 2007 - 17:21:06 EST


On Tue, 2007-06-05 at 19:17 -0700, john stultz wrote:
> Hey Ingo,
> So we've been seeing the following trace fairly frequently on our SMP
> boxes when running kernbench:
>
> BUG: at kernel/softirq.c:639 __tasklet_action()
>
> Call Trace:
> [<ffffffff8106d5da>] dump_trace+0xaa/0x32a
> [<ffffffff8106d89b>] show_trace+0x41/0x5c
> [<ffffffff8106d8cb>] dump_stack+0x15/0x17
> [<ffffffff81094a97>] __tasklet_action+0xdf/0x12e
> [<ffffffff81094f76>] tasklet_action+0x27/0x29
> [<ffffffff8109530a>] ksoftirqd+0x16c/0x271
> [<ffffffff81033d4d>] kthread+0xf5/0x128
> [<ffffffff8105ff68>] child_rip+0xa/0x12
>
>
> Paul also pointed this out awhile back: http://lkml.org/lkml/2007/2/25/1
>
>
> Anyway, I think I finally found the issue. Its a bit hard to explain,
> but the idea is while __tasklet_action is running the tasklet function
> on CPU1, if a call to tasklet_schedule() on CPU2 is made, and if right
> after we mark the TASKLET_STATE_SCHED bit we are preempted,
> __tasklet_action on CPU1 might be able to re-run the function, clear the
> bit and unlock the tasklet before CPU2 enters __tasklet_common_schedule.
> Once __tasklet_common_schedule locks the tasklet, we will add the
> tasklet to the list with the TASKLET_STATE_SCHED *unset*.
>
> I've verified this race occurs w/ a WARN_ON in
> __tasklet_common_schedule().
>
>
> This fix avoids this race by making sure *after* we've locked the
> tasklet that the STATE_SCHED bit is set before adding it to the list.
>
> Does it look ok to you?
>
> thanks
> -john
>
> Signed-off-by: John Stultz <johnstul@xxxxxxxxxx>
>
> Index: 2.6-rt/kernel/softirq.c
> ===================================================================
> --- 2.6-rt.orig/kernel/softirq.c 2007-06-05 18:30:54.000000000 -0700
> +++ 2.6-rt/kernel/softirq.c 2007-06-05 18:36:44.000000000 -0700
> @@ -544,10 +544,17 @@ static void inline
> __tasklet_common_schedule(struct tasklet_struct *t, struct tasklet_head *head, unsigned int nr)
> {
> if (tasklet_trylock(t)) {
> - WARN_ON(t->next != NULL);
> - t->next = head->list;
> - head->list = t;
> - raise_softirq_irqoff(nr);
> + /* We may have been preempted before tasklet_trylock
> + * and __tasklet_action may have already run.
> + * So double check the sched bit while the takslet
> + * is locked before adding it to the list.
> + */
> + if (test_bit(TASKLET_STATE_SCHED, &t->state)) {
> + WARN_ON(t->next != NULL);
> + t->next = head->list;
> + head->list = t;
> + raise_softirq_irqoff(nr);
> + }
> tasklet_unlock(t);
> }
> }

So while digging on a strange OOM issue we were seeing (which actually
ended up being fixed by Steven's softirq patch), I noticed that the fix
above is incomplete. With only the patch above, we may no longer have
unscheduled tasklets added to the list, but we may end up with scheduled
tasklets that are not on the list (and will stay that way!).

The following additional patch should correct this issue. Although since
we weren't actually hitting it, the issue is a bit theoretical, so I've
not been able to prove it really fixes anything.

thanks
-john

Index: 2.6-rt/kernel/softirq.c
===================================================================
--- 2.6-rt.orig/kernel/softirq.c 2007-06-12 20:30:11.000000000 -0700
+++ 2.6-rt/kernel/softirq.c 2007-06-12 20:37:22.000000000 -0700
@@ -544,6 +544,7 @@
__tasklet_common_schedule(struct tasklet_struct *t, struct tasklet_head *head, unsigned int nr)
{
if (tasklet_trylock(t)) {
+again:
/* We may have been preempted before tasklet_trylock
* and __tasklet_action may have already run.
* So double check the sched bit while the takslet
@@ -554,8 +555,21 @@
t->next = head->list;
head->list = t;
raise_softirq_irqoff(nr);
+ tasklet_unlock(t);
+ } else {
+ /* This is subtle. If we hit the corner case above
+ * It is possible that we get preempted right here,
+ * and another task has successfully called
+ * tasklet_schedule(), then this function, and
+ * failed on the trylock. Thus we must be sure
+ * before releasing the tasklet lock, that the
+ * SCHED_BIT is clear. Otherwise the tasklet
+ * may get its SCHED_BIT set, but not added to the
+ * list
+ */
+ if (!tasklet_tryunlock(t))
+ goto again;
}
- tasklet_unlock(t);
}
}



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/