RE: [PATCH v2] futex: lower the lock contention on the HB lock during wake up

From: Thomas Gleixner
Date: Tue Sep 15 2015 - 20:01:42 EST


On Tue, 15 Sep 2015, Zhu Jefferry wrote:

Please configure your e-mail client proper and follow the basic rules:

- Choose a meaningful subject for your questions

You just copied a random subject line from some other mail thread,
which makes your mail look like a patch. But it's not a patch. You
have a question about futexes and patches related to them.

- Make sure your lines are no longer than 72 to 76 characters in length.

You can't assume that all e-mail and news clients behave like yours, and while yours might wrap lines automatically when the text reaches the right of the window containing it, not all do.

For the sentence above I need a 190 character wide display ....

- Do not use colors or other gimmicks. They just make the mail
unreadable in simple text based readers.

> Just in the list, I see the patch "[PATCH v2] futex: lower the lock
> contention on the HB lock during wake up" at
> http://www.gossamer-threads.com/lists/linux/kernel/2199938?search_string=futex;#2199938.

> But I see another patch with same name, different content here,
> 23b7776290b10297fe2cae0fb5f166a4f2c68121(http://code.metager.de/source/xref/linux/stable/kernel/futex.c?r=23b7776290b10297fe2cae0fb5f166a4f2c68121)

I have no idea what that metager thing tells you and I really don't
want to know. Plain git tells me:

# git show 23b7776290b10297fe2cae0fb5f166a4f2c68121
Merge: 6bc4c3ad3619 6fab54101923
Author: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx
Date: Mon Jun 22 15:52:04 2015 -0700

Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

So that's a merge commit where Linus pulled a pile of changes into the
mainline kernel. And that merge does not contain the patch above, but
it contains a different change to the futex code.

> Could you please help to give a little bit more explanation on this,
> why they have same name with different modify in the futex.c? I'm a
> newbie in the community.

Use the proper tools and not some random web interface. The commit you
are looking for is a completely different one.

# git log kernel/futex.c
....
commit 802ab58da74bb49ab348d2872190ef26ddc1a3e0
Author: Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx
Date: Wed Jun 17 10:33:50 2015 +0200

futex: Lower the lock contention on the HB lock during wake up
....

And that's the same as the one in the LKML thread plus a fixup.

> Actually, I encounter a customer issue which is related to the glibc
> code "pthread_mutex_lock", which is using the futex service in
> kernel, without the patches above.

The patches above are merily an optimization and completely unrelated
to your problem.

You fail to provide the real interesting information here:

- Which architecture/SoC
- Which kernel version and which extra patches
- Which glibc version and which extra patches

> After lots of customer discussing, ( I could not reproduce the
> failure in my office), I seriously suspect there might be some
> particular corner cases in the futex code.

The futex code is more or less a conglomorate of corner cases.

But again you fail to provide the real interesting information:

- What is the actual failure ?

The information that you discussed that with your customer is
completely irrelevant and your suspicion does not clarify the issue
either.

> In the unlock flow, the user space code (pthread_mutex_unlock) will
> check FUTEX_WAITERS flag first, then wakeup the waiters in the
> kernel list. But in the lock flow, the kernel code (futex) will set
> FUTEX_WAITERS in first too, then try to get the waiter from the
> list. They are following same sequence, flag first, entry in list
> secondly. But there might be some timing problem in SMP system, if
> the query (unlock flow) is executing just before the list adding
> action (lock flow).

There might be some timing problem, if the code would look like the
scheme you painted below, but it does not.

> It might cause the mutex is never really released, and other threads
> will infinite waiting. Could you please help to take a look at it?
>
> CPU 0 (trhead 0) CPU 1 (thread 1)
>
> mutex_lock
> val = *futex;
> sys_futex(LOCK_PI, futex, val);
>
> return to user space

If the futex is uncontended then you don't enter the kernel for
acquiring the futex.

> after acquire the lock mutex_lock
> val = *futex;
> sys_futex(LOCK_PI, futex, val);

The futex FUTEX_LOCK_PI operation does not take the user space value. That's
what FUTEX_WAIT does.

> lock(hash_bucket(futex));
> set FUTEX_WAITERS flag
> unlock(hash_bucket(futex)) and retry due to page fault

So here you are completely off the track. If the 'set FUTEX_WAITERS
bit' operation fails due to a page fault, then the FUTEX_WAITERS bit
is not set. So it cannot be observed on the other core.

The flow is:

sys_futex(LOCK_PI, futex, ...)

retry:
lock(hb(futex));
ret = set_waiter_bit(futex);
if (ret == -EFAULT) {
unlock(hb(futex));
handle_fault();
goto retry;
}

list_add();
unlock(hb(futex));
schedule();

So when set_waiter_bit() succeeds, then the hash bucket lock is held
and blocks the waker. So it's guaranteed that the waker will see the
waiter on the list.

If set_waiter_bit() faults, then the waiter bit is not set and
therefor there is nothing to wake. So the waker will not enter the
kernel because the futex is uncontended.

So now, lets assume that the waiter failed to set the waiter bit and
the waker unlocked the futex. When the waiter retries then it actually
checks whether the futex still has an owner. So it observes the owner
has been cleared, it acquires the futex and returns.

It's a bit more complex than that due to handling of the gazillion of
corner cases, but that's the basic synchronization mechanism and there
is no hidden timing issue on SMP.

Random speculation is not helping here.

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/