Re: wait_on_page_bit_common(TASK_KILLABLE, EXCLUSIVE) can miss wakeup?

From: Linus Torvalds
Date: Wed Jun 24 2020 - 12:37:16 EST


On Wed, Jun 24, 2020 at 9:20 AM Oleg Nesterov <oleg@xxxxxxxxxx> wrote:
>
> If T1 is killed it is TASK_RUNNING, try_to_wake_up() should return 0.

Hmm. I already acknowledge your bug, but yeah, this is subtle.

But I think the bug still exists.

So the requirement is:

- bit_is_set returns false so we don't call io_schedule: we're still
TASK_KILLABLE

- somebody else gets the lock, so the test_and_set_bit_lock() fails

- that somebody else releases the lock almost immediately, and wakes
us up because we're still on the wait-queue (and still TASK_KILLABLE,
not TASK_RUNNING)

- another party sends us a SIGKILL

- we see the signal_pending_state() and exit

- we've now been woken up, but didn't wake anybody else up, and the
lock is released but there may be waiters who came in at the same time
and never saw the wakeup.

I think this is basically impossible to hit in practice, but it does
look like a real bug.

Maybe I'm missing something. This code is subtle.

But as mentioned, I _think_ the real problem is that "don't do
io_schedule()" optimization, because that's the thing that means that
we can miss the wakeup.

If we only had a single "test_and_set_bit_lock()" thing, we'd be ok.
Either we got the lock or we didn't, and if we didn't we'd schedule
and re-try, and we'd never have the race between testing signals and
bits while we're "sleeping" and can be woken up and try_to_wake_up()
thought we took it.

Linus