Re: [PATCH 5/5] sched,futex: Provide delayed wakeup list

From: Peter Zijlstra
Date: Sat Nov 23 2013 - 06:48:55 EST


On Fri, Nov 22, 2013 at 04:56:37PM -0800, Davidlohr Bueso wrote:
> From: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
>
> Original patchset: https://lkml.org/lkml/2011/9/14/118
>
> This is useful for locking primitives that can effect multiple
> wakeups per operation and want to avoid lock internal lock contention
> by delaying the wakeups until we've released the lock internal locks.
>
> Alternatively it can be used to avoid issuing multiple wakeups, and
> thus save a few cycles, in packet processing. Queue all target tasks
> and wakeup once you've processed all packets. That way you avoid
> waking the target task multiple times if there were multiple packets
> for the same task.
>
> This patch adds the needed infrastructure into the scheduler code
> and uses the new wake_list to delay the futex wakeups until
> after we've released the hash bucket locks. This avoids the newly
> woken tasks from immediately getting stuck on the hb->lock.
>
> Cc: Ingo Molnar <mingo@xxxxxxxxxx>
> Cc: Darren Hart <dvhart@xxxxxxxxxxxxxxx>
> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> Cc: Mike Galbraith <efault@xxxxxx>
> Cc: Jeff Mahoney <jeffm@xxxxxxxx>
> Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
> Cc: Scott Norton <scott.norton@xxxxxx>
> Cc: Tom Vaden <tom.vaden@xxxxxx>
> Cc: Aswin Chandramouleeswaran <aswin@xxxxxx>
> Cc: Waiman Long <Waiman.Long@xxxxxx>
> Tested-by: Jason Low <jason.low2@xxxxxx>
> [forward ported]
> Signed-off-by: Davidlohr Bueso <davidlohr@xxxxxx>
> ---
> Please note that in the original thread there was some debate
> about spurious wakeups (https://lkml.org/lkml/2011/9/17/31), so
> you can consider this more of an RFC patch if folks believe that
> this functionality is incomplete/buggy.

Right, from what I remember, this patch can cause spurious wakeups, and
while all our regular sleeping lock / wait thingies can deal with this,
not all creative schedule() usage in the tree can deal with this.

There's about ~1400 (or there were that many 2 years ago, might be more
by now) schedule() calls, many of which are open coded wait constructs
of which most are buggy in one way or another.

So we first need to audit / fix all those before we can do this one.

I used to have a patch to schedule() that would always immediately fall
through and only actually block on the second call; it illustrated the
problem really well, in fact so well the kernels fails to boot most
times.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/