Re: wait_on_page_bit_common(TASK_KILLABLE, EXCLUSIVE) can miss wakeup?

From: Linus Torvalds
Date: Sun Jun 28 2020 - 01:39:51 EST


On Fri, Jun 26, 2020 at 8:43 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> I ended up with something like the below.. but it is too warm to think
> properly.
>
> I don't particularly like WQ_FLAG_PAGEWAITERS, but I liked open-coding
> all that even less.

Ugh.

I think I have a much simpler approach, actually.

So the *problem* is purely around that

if (behavior == EXCLUSIVE) {
if (!test_and_set_bit_lock(bit_nr, &page->flags))
break;
} else ..

and in particular it is purely *after* that test_and_set_bit_lock()
case. We have two cases:

(a) *If* we get the lock there, we're good, and all done, and we have
the lock. We don't care about any other wakeups, because they'll be
stale anyway (the thing that released the lock that we just took) and
because we got the lock, no other exclusive waiters should be woken up
anyway (and we will in turn wake up any waiters when we release it)

(b) we did *not* get the lock, because somebody else got it and is
about to immediately unlock again. And that _future_ wakeup that they
send might get lost because it might end up targeting us (but we might
just exit and not care).

Agreed?

So the only case that really matters is that we didn't get the lock,
but we must *not* be woken up afterwards.

So how about the attached trivial two-liner? We solve the problem by
simply marking ourselves TASK_RUNNING, which means that we won't be
counted as an exclusive wakeup.

Ok, so the "one" line to do that is that is actually two lines:

__set_current_state(TASK_RUNNING);
smp_mb__before_atomic();

and there's four lines of comments to go with it, but it really is
very simple: if we do that before we do the test_and_set_bit_lock(),
no wakeups will be lost, because we won't be sleeping for that wakeup.

I'm not entirely happy about that "smp_mb__before_atomic()". I think
it's right in practice that test_and_set_bit_lock() (when it actually
does a write) has at LEAST atomic seqmantics, so I think it's good.
But it's not pretty.

But I don't want to use a heavy

set_current_state(TASK_RUNNING);
if (!test_and_set_bit_lock(bit_nr, &page->flags)) ..

sequence, because at least on x86, that test_and_set_bit_lock()
already has a memory barrier in it, so the extra memory barrier from
set_current_state() is all kinds of pointless.

Hmm?

Linus

Attachment: patch
Description: Binary data