Re: [PATCH RESEND v4] sched/fair: Add advisory flag for borrowing a timeslice

From: Rik van Riel
Date: Tue Dec 23 2014 - 17:35:31 EST


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 12/23/2014 03:47 PM, Khalid Aziz wrote:

>> You are right. Uncontended futex is very fast since it never goes
>> into kernel. Queuing problem happens when the lock holder has
>> been pre-empted. Adaptive spinning does the smart thing os
>> spin-waiting only if the lock holder is still running on another
>> core. If lock holder is not scheduled on any core, even adaptive
>> spinning has to go into the kernel to be put on wait queue. What
>> would avoid queuing problem and reduce the cost of contention is
>> a combination of adaptive spinning, and a way to keep the lock
>> holder running on one of the cores just a little longer so it can
>> release the lock. Without creating special case and a new API in
>> kernel, one way I can think of accomplishing the second part is
>> to boost the priority of lock holder when contention happens and
>> priority ceiling is meant to do exactly that. Priority ceiling
>> implementation in glibc boosts the priority by calling into
>> scheduler which does incur the cost of a system call. Priority
>> boost is a reliable solution that does not change scheduling
>> semantics. The solution allowing lock holder to use one extra
>> timeslice is not a definitive solution but tpcc workload shows it
>> does work and it works without requiring changes to database
>> locking code.
>
>> Theoretically a new locking library that uses both these
>> techniques will help solve the problem but being a new locking
>> library, there is a big unknown of what new problems, performance
>> and otherwise, it will bring and database has to recode to this
>> new library. Nevertheless this is the path I am exploring now.
>> The challenge being how to do this without requiring changes to
>> database code or the kernel. The hooks available to me into
>> current database code are schedctl_init(), schedctl_start() and
>> schedctl_stop() which are no-op on Linux at this time.

That sounds like a feature. Keep the uncontended operations fast
by not doing anything, and only slow down when there is contention.

Presumably the database people will optimize their code to avoid
contention, so any complexity can happen in the slow path, instead
of by adding things to the fast path...

- --
All rights reversed
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQEcBAEBAgAGBQJUme3NAAoJEM553pKExN6DD8gH/3am5Izrobk/AiN8sijg3YXA
a9orVuoWNE+BLt49PwWrYpjsR2AgN4G3BbUrb4GVhaFBL5/v/frUhk0As3w3uM21
QjxMtaFvqZviLWCFgtIna7zSxHom+v/eRiAjLtCoX+GtHs+t25Jyf1GowmZnkoNd
UtDPHPXmyA2CqZC0E9d53Uzb9XaP/T4G3J8U2aPSvwoj4Nw85H2S/QMptNQEJDjY
0Qpx/fv2Ze/gJ7GujU3gloX6cH5DDU+p9/pFZ7iDEB6jbbb384Zuacq6R6CeJMVB
EAxKW1tpFtPvaRC51x8TFNJY5FxSISbXKbehxKjXQ8rlkcM/k1euzo2KCKOp68w=
=cTlU
-----END PGP SIGNATURE-----
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/