[GIT PULL] locking changes for v4.9

From: Ingo Molnar
Date: Mon Oct 03 2016 - 03:09:41 EST


Linus,

Please pull the latest locking-core-for-linus git tree from:

git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git locking-core-for-linus

# HEAD: 08645077b7f9f7824dbaf1959b0e014a894c8acc x86/cmpxchg, locking/atomics: Remove superfluous definitions

The main changes in this cycle were:

- rwsem micro-optimizations (Davidlohr Bueso)

- Improve the implementation and optimize the performance of percpu-rwsems.
(Peter Zijlstra.)

- Convert all lglock users to better facilities such as percpu-rwsems or
percpu-spinlocks and remove lglocks. (Peter Zijlstra)

- Remove the ticket (spin)lock implementation. (Peter Zijlstra)

- Korean translation of memory-barriers.txt and related fixes to the
English document. (SeongJae Park)

- misc fixes and cleanups

Thanks,

Ingo

------------------>
Davidlohr Bueso (3):
locking/rwsem: Return void in __rwsem_mark_wake()
locking/rwsem: Remove a few useless comments
locking/rwsem: Scan the wait_list for readers only once

Jan Beulich (1):
locking/rwsem, x86: Drop a bogus cc clobber

Nikolay Borisov (1):
x86/cmpxchg, locking/atomics: Remove superfluous definitions

Oleg Nesterov (1):
stop_machine: Remove stop_cpus_lock and lg_double_lock/unlock()

Pan Xinhui (1):
locking/pv-qspinlock: Use cmpxchg_release() in __pv_queued_spin_unlock()

Peter Zijlstra (10):
locking/qspinlock: Improve readability
locking/percpu-rwsem: Optimize readers and reduce global impact
locking, rcu, cgroup: Avoid synchronize_sched() in __cgroup_procs_write()
locking/percpu-rwsem: Add DEFINE_STATIC_PERCPU_RWSEMand percpu_rwsem_assert_held()
fs/locks: Replace lg_global with a percpu-rwsem
fs/locks: Replace lg_local with a per-cpu spinlock
locking/percpu-rwsem: Add down_read_preempt_disable()
fs/locks: Use percpu_down_read_preempt_disable()
locking/lglock: Remove lglock implementation
x86, locking/spinlocks: Remove ticket (spin)lock implementation

SeongJae Park (4):
locking/Documentation: Maintain consistent blank line
locking/Documentation: Fix wrong section reference
locking/Documentation: Fix a typo of example result
locking/Documentation: Add Korean translation

Thomas Gleixner (1):
futex: Add some more function commentry

Vegard Nossum (1):
locking/hung_task: Show all locks

Waiman Long (1):
locking/pvstat: Separate wait_again and spurious wakeup stats


Documentation/ko_KR/memory-barriers.txt | 3135 +++++++++++++++++++++++++++++++
Documentation/locking/lglock.txt | 166 --
Documentation/memory-barriers.txt | 5 +-
arch/x86/Kconfig | 3 +-
arch/x86/include/asm/cmpxchg.h | 44 -
arch/x86/include/asm/paravirt.h | 18 -
arch/x86/include/asm/paravirt_types.h | 7 -
arch/x86/include/asm/rwsem.h | 2 +-
arch/x86/include/asm/spinlock.h | 174 --
arch/x86/include/asm/spinlock_types.h | 13 -
arch/x86/kernel/kvm.c | 245 ---
arch/x86/kernel/paravirt-spinlocks.c | 7 -
arch/x86/kernel/paravirt_patch_32.c | 4 +-
arch/x86/kernel/paravirt_patch_64.c | 4 +-
arch/x86/xen/spinlock.c | 250 +--
fs/Kconfig | 1 +
fs/locks.c | 68 +-
include/linux/lglock.h | 81 -
include/linux/percpu-rwsem.h | 108 +-
include/linux/rcu_sync.h | 1 +
kernel/cgroup.c | 6 +
kernel/futex.c | 15 +-
kernel/hung_task.c | 2 +-
kernel/locking/Makefile | 1 -
kernel/locking/lglock.c | 111 --
kernel/locking/percpu-rwsem.c | 228 ++-
kernel/locking/qspinlock_paravirt.h | 26 +-
kernel/locking/qspinlock_stat.h | 4 +-
kernel/locking/rwsem-xadd.c | 92 +-
kernel/rcu/sync.c | 14 +
kernel/stop_machine.c | 42 +-
31 files changed, 3540 insertions(+), 1337 deletions(-)
create mode 100644 Documentation/ko_KR/memory-barriers.txt
delete mode 100644 Documentation/locking/lglock.txt
delete mode 100644 include/linux/lglock.h
delete mode 100644 kernel/locking/lglock.c

diff --git a/Documentation/ko_KR/memory-barriers.txt b/Documentation/ko_KR/memory-barriers.txt
new file mode 100644
index 000000000000..34d3d380893d
--- /dev/null
+++ b/Documentation/ko_KR/memory-barriers.txt
@@ -0,0 +1,3135 @@
+NOTE:
+This is a version of Documentation/memory-barriers.txt translated into Korean.
+This document is maintained by SeongJae Park <sj38.park@xxxxxxxxx>.
+If you find any difference between this document and the original file or
+a problem with the translation, please contact the maintainer of this file.
+
+Please also note that the purpose of this file is to be easier to
+read for non English (read: Korean) speakers and is not intended as
+a fork. So if you have any comments or updates for this file please
+update the original English file first. The English version is
+definitive, and readers should look there if they have any doubt.
+
+===================================
+ì ëìë
+Documentation/memory-barriers.txt
+ì íê ëììëë.
+
+ììï ëìì <sj38.park@xxxxxxxxx>
+===================================
+
+
+ =========================
+ ëëì ìë ëëë ëëì
+ =========================
+
+ìì: David Howells <dhowells@xxxxxxxxxx>
+ Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>
+ Will Deacon <will.deacon@xxxxxxx>
+ Peter Zijlstra <peterz@xxxxxxxxxxxxx>
+
+========
+ëììí
+========
+
+ì ëìë ëììê ìëëë; ì ëìë ìëíì ììë, êêìì ìí ìëë
+ëëë ìê, ìëíì ìììë ìëì ìí ììëëë ëììí ëëë ììëë.
+ì ëìë ëëììì ìêíë ëìí ëëë ëëìëì ììíê ìí
+ìëììëëë, ëê ììíë ììë (êëê ëì êëë) ìëì ëíëëëë.
+
+ëì ëíìë, ì ëìë ëëìê íëììì êëíë ìíì ëí ëììê
+ìëëë.
+
+ì ëìì ëìì ëêììëë:
+
+ (1) ìë íì ëëìì ëí êëí ì ìë ììíì êëì ëìíê ìíì,
+ êëê
+
+ (2) ìì êëí ëëìëì ëí ìëê ììíì íëìì ëí ìëë ìêíê
+ ìíì.
+
+ìë ìííìë íìí ëëìëì ëíìë ìêì ììêíë ììíì
+ìêìíëëë ëì êëì ìêí ìë ììëëë, ìêì ììêíë
+ìêìíëì ììíì ìë ìííìê ìëë ê ìííìê ìëë êìë ìì
+ììëìê ëëëë.
+
+ëí, íì ìííììì ìë ëëìë íë ìííìì íìí ëì ëììë ìí
+íë ëëìì ëìì ììì ëíìíì no-op ì ëìë ììì ììëìê
+ëëëë.
+
+ìì: ë ëì ìì ìëíì ììë, ì ìì ëëììëë ìëë êìêë
+íëë. ìí êì ëìëì êëë ìëí ìíë ìíìë ëìëê ìëì íê
+ìììë ëìëì íëì êìëë íìíìê ììëëë, ëêëë ìì ëì
+ëíìë ììë ìêì ëíëëëë. êí ëììë ìí ìíë ììííê ìí
+ìëí ëëì ìì êììë ììíì ìëëë ìëì ììë ììíëë.
+
+
+=====
+ëì:
+=====
+
+ (*) ìì ëëë ììì ëë.
+
+ - ëëìì ìíëìì.
+ - ëììí.
+
+ (*) ëëë ëëìë ëììê?
+
+ - ëëë ëëìì ìë.
+ - ëëë ëëìì ëí êìíì ìë ê.
+ - ëìí ììì ëëì.
+ - ìíë ììì.
+ - SMP ëëì ìëìê.
+ - ëëë ëëì ìíìì ì.
+ - ìê ëëë ëëì vs ëë ìì.
+ - ìíì
+
+ (*) ëìì ìë ëëì.
+
+ - ìíìë ëëì.
+ - CPU ëëë ëëì.
+ - MMIO ìê ëëì.
+
+ (*) ìëì ìë ëëë ëëì.
+
+ - ë Acquisition íì.
+ - ìíëí ëíìí íì.
+ - ìëê ììíì íì.
+ - êìì íìë.
+
+ (*) CPU ê ACQUIRING ëëìì íê.
+
+ - Acquire vs ëëë ììì.
+ - Acquire vs I/O ììì.
+
+ (*) ëëë ëëìê íìí ê
+
+ - íëììê ìí ìì.
+ - ìíë ìíëìì.
+ - ëëìì ììì.
+ - ìíëí.
+
+ (*) ìë I/O ëëìì íê.
+
+ (*) êìëë êì ìíë ìí ìì ëë.
+
+ (*) CPU ììì ìí.
+
+ - ìì ìêì.
+ - ìì ìêì vs DMA.
+ - ìì ìêì vs MMIO.
+
+ (*) CPU ëì ììëë ìë.
+
+ - êëê, Alpha ê ìë.
+ - êì ëì êìí.
+
+ (*) ìì ì.
+
+ - ìíì ëí.
+
+ (*) ìê ëí.
+
+
+=======================
+ìì ëëë ììì ëë
+=======================
+
+ëìê êì ììíë ììí ëëì ìêí ëìë:
+
+ : :
+ : :
+ : :
+ +-------+ : +--------+ : +-------+
+ | | : | | : | |
+ | | : | | : | |
+ | CPU 1 |<----->| Memory |<----->| CPU 2 |
+ | | : | | : | |
+ | | : | | : | |
+ +-------+ : +--------+ : +-------+
+ ^ : ^ : ^
+ | : | : |
+ | : | : |
+ | : v : |
+ | : +--------+ : |
+ | : | | : |
+ | : | | : |
+ +---------->| Device |<----------+
+ : | | :
+ : | | :
+ : +--------+ :
+ : :
+
+íëêëì ìë ëëë ììì ìíëììì ëììíê, êêì CPU ë êë
+íëêëëì ìííëë. ììíë CPU ëëìì ëëë ìíëììëì ììë
+ëì ìíëì ìê, CPU ë íëêëì ìêêêë ìêì ìë ìíë êëëëê
+ëì ìë ìëë ëëë ìíëììì ììì ìíë ìë ììëëë ìëìí
+ëììí ì ììëë. ëìíê, ìíìë ëí íëêëì ììì ëìì íìì
+ìë íë ëììë ìë ììëë ììì ìíë ëë ììíëìì ìëì í ì
+ììëë.
+
+ëëì ìì ëììêëìì í CPUê ëììíë ëëë ìíëììì ëëìëë
+ëíë íë ìíëììì CPU ì ììíì ëë ëëë ììì ìííìì(ìì)ë
+ìëêëì ììíì ëëì ëëëì ììëëë.
+
+
+ìë ëì, ëìì ìëì ìëíëì ìêí ëìë:
+
+ CPU 1 CPU 2
+ =============== ===============
+ { A == 1; B == 2 }
+ A = 3; x = B;
+ B = 4; y = A;
+
+ëììêëì êìëì ììí ëëë ììíì ëììê ëë ìììëì ëìì ì
+24êì ìíìë ìêìë ì ììëë:
+
+ STORE A=3, STORE B=4, y=LOAD A->3, x=LOAD B->4
+ STORE A=3, STORE B=4, x=LOAD B->4, y=LOAD A->3
+ STORE A=3, y=LOAD A->3, STORE B=4, x=LOAD B->4
+ STORE A=3, y=LOAD A->3, x=LOAD B->2, STORE B=4
+ STORE A=3, x=LOAD B->2, STORE B=4, y=LOAD A->3
+ STORE A=3, x=LOAD B->2, y=LOAD A->3, STORE B=4
+ STORE B=4, STORE A=3, y=LOAD A->3, x=LOAD B->4
+ STORE B=4, ...
+ ...
+
+ëëì ëìì ëêì ìíì êëì ëì ì ììëë:
+
+ x == 2, y == 1
+ x == 2, y == 3
+ x == 4, y == 1
+ x == 4, y == 3
+
+
+íë ë ëìêì, í CPU ê ëëë ììíì ëìí ìíì ìíëììëì êêë
+ëë CPU ììì ëë ìíëììì íí ììëëë, ì ë ìíìê ëìë ììì
+ëë ììë ììë ìë ììëë.
+
+
+ìë, ìëì ìëì ìëíëì ìêí ëìë:
+
+ CPU 1 CPU 2
+ =============== ===============
+ { A == 1, B == 2, C == 3, P == &A, Q == &C }
+ B = 4; Q = P;
+ P = &B D = *Q;
+
+D ë ìíìë êì CPU 2 ìì P ëëí ìíì ììêì ììììê ëëì ìêì
+ëëí ëìí ìììì ììëë. íìë ì ìëíëì ìí êêëë ìëì
+êêëì ëë ëíë ì ììëë:
+
+ (Q == &A) and (D == 1)
+ (Q == &B) and (D == 2)
+ (Q == &B) and (D == 4)
+
+CPU 2 ë *Q ì ëëë ììíê ìì P ë Q ì ëê ëëì D ì C ë ììëë
+ìì ììì ììëìì.
+
+
+ëëìì ìíëìì
+-------------------
+
+ìë ëëììë ììì ìíë ìííììë ëëëì íì ìììë ëííì
+ìêíëë(Memory mapped I/O), íë ìíë ëììíì ìêíë ììë ëì
+ììíëë. ìë ëì, ìëëì íí ëììí (A) ì ëìí íí ëììí (D)
+ë íí ìêëë ëë ëììí ìíì êë ìëë ìëë ìêí ëìë. ëëì
+5ë ëììíë ìê ìí ëìì ìëê ììë ì ììëë:
+
+ *A = 5;
+ x = *D;
+
+íìë, ìê ëìì ë ìí ì íëë ëëìì ì ììëë:
+
+ STORE *A = 5, x = LOAD *D
+ x = LOAD *D, STORE *A = 5
+
+ëëì ìíì ëìíë ììì _íì_ ììë ììíëë, ìëìì ììí êëë.
+
+
+ëììí
+--------
+
+CPU ìê êëí ì ìë ììíì ëììí ëêìê ììëë:
+
+ (*) ìë CPU ë, ìììì ììíë ëëë ìììëì íë CPU ìììê
+ ìììë ììëë ëëë ììíì ìí ììëëë. ì, ëìì ëíì:
+
+ Q = READ_ONCE(P); smp_read_barrier_depends(); D = READ_ONCE(*Q);
+
+ CPU ë ëìê êì ëëë ìíëìì ìíìë ìí ììíëë:
+
+ Q = LOAD P, D = LOAD *Q
+
+ êëê ê ìíì ëììì ììë íì ìììëë. ëëëì ììíìì
+ smp_read_barrier_depends() ë ìëìë ìíìë DEC Alpha ììë
+ ëìììë ììëìì íëë. ëíì êììë smp_read_barrier_depends()
+ ë ìì ììíë ëì rcu_dereference() êì êëì ììíì íì
+ ììëìì.
+
+ (*) íì CPU ëìì êìë ììì ëëëì ííìë ëëì ìíì ëì íë
+ CPU ìììë ììê ëëì ìì êìë ëììëë. ì, ëìì ëíì:
+
+ a = READ_ONCE(*X); WRITE_ONCE(*X, b);
+
+ CPU ë ëìì ëëë ìíëìì ìíìëì ëëëì ììí êëë:
+
+ a = LOAD *X, STORE *X = b
+
+ êëê ëìì ëíìë:
+
+ WRITE_ONCE(*X, c); d = READ_ONCE(*X);
+
+ CPU ë ëìì ìí ììëì ëëì ëëë:
+
+ STORE *X = c, d = LOAD *X
+
+ (ëë ìíëììê ìíì ìíëììì êìë ëëë ììì ëí
+ ìíëëë íë ìíëììëì êìëê ííëëë).
+
+êëê _ëëì_ ëë _ìëë_ êìíêë êìíì ëìì íë êëì ììëë:
+
+ (*) ìíìëê READ_ONCE() ë WRITE_ONCE() ë ëíëì ìì ëëë ìììë
+ ëìì ìíë ëë í êìëë êìì _ìëë_ íì ìëëë. êêëì
+ ìëë, ìíìëë ìíìë ëëì ìììì ëëê ë, ëë "ìììì"
+ ëêëì ëëìë êíì êê ëëë.
+
+ (*) êëìì ëëì ìíìëì ììì ììëë ììë êìëë êìì _ìëë_
+ íì ëìì íëë. ì ëì ê:
+
+ X = *A; Y = *B; *D = Z;
+
+ ë ëìì êë ì ìë êìëë ëëìì ì ìëë ìëìëë:
+
+ X = LOAD *A, Y = LOAD *B, STORE *D = Z
+ X = LOAD *A, STORE *D = Z, Y = LOAD *B
+ Y = LOAD *B, X = LOAD *A, STORE *D = Z
+ Y = LOAD *B, STORE *D = Z, X = LOAD *A
+ STORE *D = Z, X = LOAD *A, Y = LOAD *B
+ STORE *D = Z, Y = LOAD *B, X = LOAD *A
+
+ (*) êìë ëëë ìììëì íììêë ëëì ì ììì _ëëì_ êìíì
+ íëë. ëìì ìëë:
+
+ X = *A; Y = *(A + 4);
+
+ ëìì êë ì ëë ë ì ììëë:
+
+ X = LOAD *A; Y = LOAD *(A + 4);
+ Y = LOAD *(A + 4); X = LOAD *A;
+ {X, Y} = LOAD {*A, *(A + 4) };
+
+ êëê:
+
+ *A = X; *(A + 4) = Y;
+
+ ë ëì ì ëë ë ì ììëë:
+
+ STORE *A = X; STORE *(A + 4) = Y;
+ STORE *(A + 4) = Y; STORE *A = X;
+ STORE {*A, *(A + 4) } = {X, Y};
+
+êëê ëììíì ëëëë êë(anti-guarantees)ì ììëë:
+
+ (*) ì ëììíëì bitfield ìë ììëì ìëë, ìíìëëì bitfield ë
+ ììíë ìëë ììí ë ììì ìë(non-atomic) ìê-ììíê-ìë
+ ììíëìëì ìíì ëëë êìê ëê ëëìëë. ëë ìêëìì
+ ëêíì bitfield ë ììíë íì ëììì.
+
+ (*) bitfield ëì ìë ëìë ëíëë êìë íëëë, íëì bitfield ì
+ ëë íëëì íëì ëìë ëíëìì íëë. ëì í bitfield ì ë
+ íëê ìë ëë ëìë ëíëëë, ìíìëì ììì ìë
+ ìê-ììíê-ìë ììíëì ìíì í íëìì ìëìíê êìì
+ íëìë ìíì ëìê í ì ììëë.
+
+ (*) ì ëììíëì ììíê ìëëê íêê ìí ììë ëìëì ëíìë
+ ììëëë. "ììíê íêê ìí" ìëíì íìëìë "char", "short",
+ "int" êëê "long" ê êì íêì ëìëì ìëíëë. "ììíê ìëë"
+ ì ìììë ìëì ìëíëë, ëëì "char" ì ëíìë ìë ììì ìê,
+ "short" ì ëíìë 2ëìí ìëì, "int" ìë 4ëìí ìëì, êëê
+ "long" ì ëíìë 32-bit ììíìì 64-bit ììíììì ëë 4ëìí ëë
+ 8ëìí ìëì ìëíëë. ì ëììíëì C11 íììì ìêëììëë,
+ C11 ìì ìëë ìíìë(ìë ëì, gcc 4.6) ë ììí ëì ììíìê
+ ëëëë. íìì ì ëììíëì "memory location" ì ììíë 3.14
+ ììì ëìê êì ìëëì ììëë:
+ (ìì: ììëìëë ëìíì ììëë)
+
+ memory location
+ either an object of scalar type, or a maximal sequence
+ of adjacent bit-fields all having nonzero width
+
+ NOTE 1: Two threads of execution can update and access
+ separate memory locations without interfering with
+ each other.
+
+ NOTE 2: A bit-field and an adjacent non-bit-field member
+ are in separate memory locations. The same applies
+ to two bit-fields, if one is declared inside a nested
+ structure declaration and the other is not, or if the two
+ are separated by a zero-length bit-field declaration,
+ or if they are separated by a non-bit-field member
+ declaration. It is not safe to concurrently update two
+ bit-fields in the same structure if all members declared
+ between them are also bit-fields, no matter what the
+ sizes of those intervening bit-fields happen to be.
+
+
+=========================
+ëëë ëëìë ëììê?
+=========================
+
+ììì ëëì, ìíê ìììì ìë ëëë ìíëììëì ììëë ëììì
+ììë ìíë ì ììë, ìë CPU ì CPU êì ìíìììë I/O ì ëìê ë ì
+ììëë. ëëì ìíìëì CPU ê ììë ëêëë ììì ê ì ìëë êìí
+ì ìë ìë ëëì íìíëë.
+
+ëëë ëëìë êë êì ìëìëë. ëëë ëëìë ëëìë ììì ë ìê
+ë ììì ëëë ìíëììë êì ëëì ììê ììíëë íë íêë ìëë.
+
+ììíì CPU ëê ìë ëëììëì ìëì ìëê ìí ëëì ìëì, ìí
+ìì, ëëë ìíëììëì ìí, ììì ëë(speculative load), ëëì
+ìì(speculative branch prediction), ëìí ìëì ìì(caching) ëì ëìí
+íëì ììí ì ìê ëëì ìë êìëì ììíëë. ëëë ëëìëì ìë
+íëëì ëíë íêë ììíë ëììë ììëììì ìëê ìë CPU ì
+ëëììë êì ìíììì ììììë ììí ì ìê íìëë.
+
+
+ëëë ëëìì ìë
+--------------------
+
+ëëë ëëìë ëêì êë íììë ëëëëë:
+
+ (1) ìê (ëë ìíì) ëëë ëëì.
+
+ ìê ëëë ëëìë ììíì ëë ìíëíëì íë ëëìëë ìì
+ ëìë ëë STORE ìíëììëì íë ëëì ëì ëìë ëë STORE
+ ìíëììëëë ëì ìíë êìë ëì êì ëìíëë.
+
+ ìê ëëìë ìíì ìíëììëì ëí ëëì ìì ììêìëë; ëë
+ ìíëììëì ëíìë ìë ìíë ëìì ììëë.
+
+ CPU ë ìêì íëì ëë ëëë ììíì ìëì ìíì ìíëììëì
+ íëì ììí ììëìëë. ìê ëëì ìì ëë ìíì ìíëììëì
+ ìê ëëì ëì ëë ìíì ìíëììëëë _ìì_ ìíë êëë.
+
+ [!] ìê ëëìëì ìê ëë ëìí ììì ëëìì íê ìì ëì
+ ììëììë íì ììëìì; "SMP ëëì ìëìê" ìëììì ìêíìì.
+
+
+ (2) ëìí ììì ëëì.
+
+ ëìí ììì ëëìë ìê ëëìì ëë ìíë ííìëë. ëêì ëë
+ ìíëììì ìê ëëì êì ìëì êì êêì ììíê ìì ë(ì:
+ ëëì ëëê ììí ììë ìëì ëëê ìë êì), ëëì ëëê ììì
+ ëìíë ìëì ëëì ìí ê ììê ìììê ìì ìëìí ëì ììì
+ ëìíê ìíì ëìí ììì ëëìê íìí ì ììëë.
+
+ ëìí ììì ëëìë ìí ìììì ëë ìíëììë ììì ëëì ìì
+ ììêìëë; ìíì ìíëììëìë ëëìì ëëë, ëë ìëëë
+ ëëëì ëíìë ìë ìíë ëìì ììëë.
+
+ (1) ìì ìêíëì, ììíì CPU ëì ëëë ììíì ìëì ìíì
+ ìíëììëì ëì ëê ììë, êêì êìì ìë ëë CPU ë ê
+ ìíëììëì ëëë ììíì ìíí êêë ììí ì ììëë. ììë
+ ëë CPU ì ìíì ìíëììì êêì êìì ëê ìë CPU ê ìí ììí
+ ëìí ììì ëëìë, ëëì ìì ìë ëë ìíëììì ëë CPU ìì
+ ëì ëì ìíì ìíëììê êì ììì ííëë, êë ìíì
+ ìíëììëì ëëìëë êêê ëìí ììì ëëì ëì ëë
+ ìíëììëìêë ëì êì ëìíëë.
+
+ ì ìì ììê ììì ëí êëì ëê ìíì "ëëë ëëì ìíìì ì"
+ ìëììì ìêíìê ëëëë.
+
+ [!] ìëì ëëë ëëì _ëìí_ ìììì êììì ìíë ìììì êìì
+ íëê ìëì ììëììì. ëì ëëì ëëë ìí ììê ìëì ëëì
+ ìììììë ê ìììì ìêììì ê ìì ììë êììëê ìëëë,
+ êêì _ìíë_ ììììê, ì êììë ìê ëëìë êëë êëí
+ ëìêê íìíëë. ë ììí ëìì ìíìë "ìíë ììì" ìëììì
+ ìêíìê ëëëë.
+
+ [!] ëìí ììì ëëìë ëí ìê ëëìëê íê ìì ëì ììëìì
+ íëë; "SMP ëëì ìëìê" ìëììì ìêíìì.
+
+
+ (3) ìê (ëë ëë) ëëë ëëì.
+
+ ìê ëëìë ëìí ììì ëëì êëì ëììíì ëíì ëëìëë
+ ìì ëìë ëë LOAD ìíëììëì ëëì ëì ëìëë ëë LOAD
+ ìíëììëëë ëì ííì êìë ììíì ëë ìíëíëì ëìì êì
+ ëìíëë.
+
+ ìê ëëìë ëë ìíëììì ííìë ëëì ìì ììêìëë; ìíì
+ ìíëììì ëíìë ìë ìíë ëìì ììëë.
+
+ ìê ëëë ëëìë ëìí ììì ëëìë ëìíëë ëìí ììì
+ ëëìë ëìí ì ììëë.
+
+ [!] ìê ëëìë ìëììë ìê ëëìëê íê ìì ëì ììëìì
+ íëë; "SMP ëëì ìëìê" ìëììì ìêíìì.
+
+
+ (4) ëì ëëë ëëì.
+
+ ëì(general) ëëë ëëìë ëëìëë ìì ëìë ëë LOAD ì STORE
+ ìíëììëì ëëì ëì ëìë ëë LOAD ì STORE ìíëììëëë
+ ëì ìíë êìë ììíì ëëì ìíëíëì ëìê ëì ëìíëë.
+
+ ëì ëëë ëëìë ëëì ìíì ëëì ëí ëëì ìì ììêìëë.
+
+ ëì ëëë ëëìë ìê ëëë ëëì, ìê ëëë ëëì ëëë
+ ëìíëë, ë ëëìë ëë ëìí ì ììëë.
+
+
+êëê ëêì ëìììì ìì íìì ììëë:
+
+ (5) ACQUIRE ìíëìì.
+
+ ì íìì ìíëììì ëëíì íêì ëëììë ëìíëë. ACQUIRE
+ ìíëìì ëì ëë ëëë ìíëììëì ACQUIRE ìíëìì íì
+ ììë êìë ììíì ëëì ìíëíëì ëìê ë êì ëìëëë.
+ LOCK ìíëììê smp_load_acquire(), smp_cond_acquire() ìíëììë
+ ACQUIRE ìíëììì ííëëë. smp_cond_acquire() ìíëììì ìíë
+ ìììê smp_rmb() ë ììíì ACQUIRE ì ìëì ìêìí(semantic)ì
+ ìììíëë.
+
+ ACQUIRE ìíëìì ìì ëëë ìíëììëì ACQUIRE ìíëìì ìë íì
+ ìíë êìë ëì ì ììëë.
+
+ ACQUIRE ìíëììì êì íì RELEASE ìíëììê ìì ìì ììëìì
+ íëë.
+
+
+ (6) RELEASE ìíëìì.
+
+ ì íìì ìíëììëë ëëí íêì ëëììë ëìíëë. RELEASE
+ ìíëìì ìì ëë ëëë ìíëììëì RELEASE ìíëìì ìì ìëë
+ êìë ììíì ëë ìíëíëì ëìì êì ëìëëë. UNLOCK ëì
+ ìíëììëê smp_store_release() ìíëììë RELEASE ìíëììì
+ ìììëë.
+
+ RELEASE ìíëìì ëì ëëë ìíëììëì RELEASE ìíëììì
+ ìëëê ìì ííì êìë ëì ì ììëë.
+
+ ACQUIRE ì RELEASE ìíëììì ììì ìëììë ëë ëëë ëëìì
+ íììì ììëë (íìë "MMIO ìê ëëì" ìëìììì ìëëë ììë
+ ììëìì). ëí, RELEASE+ACQUIRE ìíì ëì ëëë ëëììë ëìí
+ êì ëìíì -ììëë-. íìë, ìë ëìì ëí RELEASE ìíëììì
+ ììë ëëë ìììëì ìí êêë ì RELEASE ìíëììì ëìì êì
+ ëìì ëí ìíë ACQUIRE ìíëììì ëëëë ëëë ììììë ëìì
+ êì ëìëëë. ëëê ëíìë, ììì ëìì íëíì ììììë, íë
+ ëìì ëí ìì íëíì ììììì ëë ìììëì ìëëìì êì
+ ëìíëë.
+
+ ì, ACQUIRE ë ììíì "ìë" ëììë, êëê RELEASE ë ììíì "êê"
+ ìë ëìíëë ìëìëë.
+
+atomic_ops.txt ìì ìëëë ìíë ìíëììë ììë ììí ìììí êëê
+(ëëìë ììíì ìë) ìíë ììì êë ìì ACQUIRE ì RELEASE ëëì
+êëë ììíëë. ëëì ìíìë ëë ìííë ìíë ìíë ìíëìììì,
+ACQUIRE ë íë ìíëììì ëë ëëìë ììëê RELEASE ë íë
+ìíëììì ìíì ëëìë ììëëë.
+
+ëëë ëëìëì ë CPU ê, ëë CPU ì ëëìì êì ìíììì êëìì ìì
+ëìë íìíëë. ëì ìë ìëì êë ìíììì ìì êì ëìëëë, íë
+ìëììë ëëë ëëìë ììí íìê ììëë.
+
+
+ìêëì _ììíì_ ëììíëìì ììëìì. ëë ìííìììë ë êëí
+ëììíì ìêí ìë ììëëë, êë ëììíì ìííì ììì ìë ììì
+ëëììë ìëëì _ìì_ êëë.
+
+
+ëëë ëëìì ëí êìíì ìë ê
+-------------------------------------
+
+ëëì ìë ëëë ëëìëì ëìíì ìë êëì ììëë:
+
+ (*) ëëë ëëì ììì ëìë ìë ëëë ìììë ëëë ëëì ëëì ìí
+ ìë ììêì _ìë_ ë êìë ëìì ììëë; ëëìê íë ìì CPU ì
+ ììì íì íì íìì ìììëì ëì ì ìë ìì êë êìë ìêë ì
+ ììëë.
+
+ (*) í CPU ìì ëëë ëëìë ìííëê ììíì ëë CPU ë íëììì
+ ìë ìììì ìíì ëìëë ëìì ììíì ììëë. ëëì ìíì
+ ëëë êìì ìíì ëëì CPU ê ìëì CPU ì ìììëì êêë
+ ëëëë ììê ëëëë, ëì íëì ëìì:
+
+ (*) ìëì CPU ê ëëì CPU ì ëëë ìììëì êêë ëëë ë, _ìë_
+ ëëì CPU ê ëëë ëëìë ììíë íë, ìëì CPU _ëí_ êì ëë
+ ëëë ëëìë ììíì ìëëë ("SMP ëëì ìëìê" ìëììì
+ ìêíìì) ê êêê ìëë ììë ëììëë ëìì ììëë.
+
+ (*) CPU ëêì íëìì[*] ê ëëë ìììëì ììë ëêì ìëëë ëìì
+ ììíì ììëë. CPU ìì ìêì ëìëìì ëëë ëëìì êìì
+ ìíì CPU ììì ìííê íìë, ììëë ìííìë ìì ì ììëë.
+
+ [*] ëì ëìíë DMA ì ìêìì ëíìë ëìì ìêíìê ëëëë:
+
+ Documentation/PCI/pci.txt
+ Documentation/DMA-API-HOWTO.txt
+ Documentation/DMA-API.txt
+
+
+ëìí ììì ëëì
+--------------------
+
+ëìí ììì ëëìì ììì ìì ììì íë ìíëì ìê ëëíê, ëìí
+ììì ëëìê ììëìì íë ìíë íì ëëíìë ììëë. ìëì ìí
+ëìì ìëí ìíìë ìêí ëìë:
+
+ CPU 1 CPU 2
+ =============== ===============
+ { A == 1, B == 2, C == 3, P == &A, Q == &C }
+ B = 4;
+ <ìê ëëì>
+ WRITE_ONCE(P, &B)
+ Q = READ_ONCE(P);
+ D = *Q;
+
+ìêì ëëí ëìí ìììì ììíëë, ì ìíìê ëëì ë Q ë &A ëë &B
+ì êìê, ëëì:
+
+ (Q == &A) ë (D == 1) ë,
+ (Q == &B) ë (D == 4) ë ìëíëë.
+
+íìë! CPU 2 ë B ì ìëìíë ììíê ìì P ì ìëìíë ììí ì ìê,
+ëëì ëìì êêê êëíëë:
+
+ (Q == &B) and (D == 2) ????
+
+ìë êêë ìêììë ìê êê ììê ìíí êìë ëì ìë ìêìë,
+êëì ììëë, êëê ì íìì (DEC Alpha ì êì) ìë CPU ìì ììë
+ëêë ì ììëë.
+
+ì ëì ìíì ìëë íêíê ìí, ëìí ììì ëëìë êëë êíë
+ëìêê ììë ììì ëì ëìíë ììì ë ììì ìêëììë íëë:
+
+ CPU 1 CPU 2
+ =============== ===============
+ { A == 1, B == 2, C == 3, P == &A, Q == &C }
+ B = 4;
+ <ìê ëëì>
+ WRITE_ONCE(P, &B);
+ Q = READ_ONCE(P);
+ <ëìí ììì ëëì>
+ D = *Q;
+
+ì ëêì ìì ìì ëêì êê ì íëëì ëìí ì ìê, ìëìì êêë
+ëìí ì ìëë íëë.
+
+ëìí ììì ëëìë ììì ìêì ëíìë ììë ìììëë:
+
+ CPU 1 CPU 2
+ =============== ===============
+ { A == 1, B == 2, C = 3, P == &A, Q == &C }
+ B = 4;
+ <ìê ëëì>
+ WRITE_ONCE(P, &B);
+ Q = READ_ONCE(P);
+ <ëìí ììì ëëì>
+ *Q = 5;
+
+ì ëìí ììì ëëìë Q ëì ìêê *Q ëì ìíìì ììë ëìê
+íìëë. ìë ëìê êì êêë ëìëë:
+
+ (Q == &B) && (B == 4)
+
+ìë ííì ëëê ììëìì íì ìì ëìê ëëëë. ëìëëë, ììì
+ìì êìì ìëë ìê ììì -ìë- íì êë ìí ëìíë ëì ìì ëìë
+ììëë êìëë. ì ííì ëëê ëìíë ìë ìê êìêëì êëíëë
+ììë ì ìê, ìëê ëëìë ììí ììë ìíê íìëì êë êëì
+ìëìë êì ëìëë.
+
+
+[!] ìëí ëìêìì ì ìíì ëëë ììë êì êê, ìë ëì í ìì
+ëíê ììë ìì ëìì ìëíê ëë ëíë íìë ìì ëìì ìëíë êê
+ëìì êì ì ëìíëë. íìí P ë íì ëíì ìì ëìì ìê, ëì B ë
+ìì ëí ìì ëìì ìëê ìêí ëìë. êë ìíìì ìê ììì íë CPU
+ì ììë ëíë í ìì ìì ëì ëììë íìë ëíë í ìì ìì ìë
+ìë íì ìê ììëë, íìí P ë ì ê (&B) ì, êëê ëì B ë ìë ê
+(2) ì êìê ìë ìíê ëìì ìë ììëë.
+
+
+ëìí ììì ëëìë ëì ììíë, ìë ëì RCU ììíìì êëìëë.
+include/linux/rcupdate.h ì rcu_assign_pointer() ì rcu_dereference() ë
+ìêíìì. ìêì ëìí ììì ëëìë RCU ë êëëë íìíì íêì íì
+íêìì ììë ìëì íêìë ëêë ìììì ìë ììë íêì ìêíê
+ìëëì ìì ìë ëììë ìì ììëì ìê íìëë.
+
+ë ëì ìë ìíì "ìì ìêì" ìëììì ìêíìì.
+
+
+ìíë ììì
+-------------
+
+ëë-ëë ìíë ìììì ëìí ììì ëëìëìëë ìíí ëìí ìê
+ììì ìê ëëë ëëìë íìë íëë. ìëì ìëë ëìë:
+
+ q = READ_ONCE(a);
+ if (q) {
+ <ëìí ììì ëëì> /* BUG: No data dependency!!! */
+ p = READ_ONCE(b);
+ }
+
+ì ìëë ìíë ëëì íêë ëì ëí ì ìëë, ì ìëìë ëìí ìììì
+ìëë ìíë ìììì ììíê ëëìë, ìë ìíìì CPU ë ìí ìëë ë
+ëëê íê ìí ëê ìêì êêë ììíê ìëë ìëì í ì ììì ëë
+CPU ë b ëëíì ëë ìíëììì a ëëíì ëë ìíëììëë ëì ëìí
+êë ììí ì ììëë. ìêì ìëë íìíë ê ëìê êìëë:
+
+ q = READ_ONCE(a);
+ if (q) {
+ <ìê ëëì>
+ p = READ_ONCE(b);
+ }
+
+íìë, ìíì ìíëììì ììììë ìíëì ììëë. ì, ëì ìììì
+êì ëë-ìíì ìíë ìììì ììíë êììë ììê -ìììë-ë
+ìëìëë.
+
+ q = READ_ONCE(a);
+ if (q) {
+ WRITE_ONCE(b, p);
+ }
+
+ìíë ìììì ëí ëë íìì ëëìëê ìì ëì ììëëë. êëëê
+íë, READ_ONCE() ë ëëì ììíì íì ëë ëìíìì! READ_ONCE() ê
+ìëë, ìíìëê 'a' ëëíì ëëë 'a' ëëíì ëëë ëëì, 'b' ëì
+ìíìë 'b' ëì ëëë ìíìì ìíí ëë ëì ëìêìì êêë ìëí ì
+ììëë.
+
+ìêë ëì ìëê, ìíìëê ëì 'a' ì êì íì 0ì ìëëê ìëí ì
+ìëë, ìì ììì "if" ëì ììì ëìê êì ììí í ìë ììëë:
+
+ q = a;
+ b = p; /* BUG: Compiler and CPU can both reorder!!! */
+
+êëë READ_ONCE() ë ëëì ììíìì.
+
+ëìê êì "if" ëì ìêë ëëìì ëë ììíë ëìí ìíìì ëí ììë
+êìíê ìì êìê ìì ì ììëë:
+
+ q = READ_ONCE(a);
+ if (q) {
+ barrier();
+ WRITE_ONCE(b, p);
+ do_something();
+ } else {
+ barrier();
+ WRITE_ONCE(b, p);
+ do_something_else();
+ }
+
+ìíêêë, íìì ìíìëëì ëì ììí ëëììë ìê ëìê êì
+ëêëëëë:
+
+ q = READ_ONCE(a);
+ barrier();
+ WRITE_ONCE(b, p); /* BUG: No ordering vs. load from a!!! */
+ if (q) {
+ /* WRITE_ONCE(b, p); -- moved up, BUG!!! */
+ do_something();
+ } else {
+ /* WRITE_ONCE(b, p); -- moved up, BUG!!! */
+ do_something_else();
+ }
+
+ìì 'a' ììì ëëì 'b' ëì ìíì ìììë ìêì êêê ìê ëëì CPU
+ë ìëì ììë ëê ì ìê ëëë: ìë êìì ìêì êêë ëëì
+íìíë, ëë ìíìë ììíê ìëììê ë íì ììëë ìëììë
+ëìêììëë. ëëì, ì ììì ììë ìíê ìíìë smp_store_release()
+ì êì ëìì ëëë ëëìê íìíëë:
+
+ q = READ_ONCE(a);
+ if (q) {
+ smp_store_release(&b, p);
+ do_something();
+ } else {
+ smp_store_release(&b, p);
+ do_something_else();
+ }
+
+ëëì ëìì ëëë ëëìê ìëë, ìë êìì ììë ìíì ìíëììëì
+ìë ëë ëìë ëìëëë, ìë ëë ëìê êì êììëë:
+
+ q = READ_ONCE(a);
+ if (q) {
+ WRITE_ONCE(b, p);
+ do_something();
+ } else {
+ WRITE_ONCE(b, r);
+ do_something_else();
+ }
+
+ììì READ_ONCE() ë ìíìëê 'a' ì êì ìëíëë êì ëê ìí ììí
+íìíëë.
+
+ëí, ëì ëì 'q' ë êìê íë ìì ëí ììíì íëë, êëì ììë
+ìíìëë ê êì ììíê ëëì íìí ìêêêë ììëë ì ììëë.
+ìë ëë:
+
+ q = READ_ONCE(a);
+ if (q % MAX) {
+ WRITE_ONCE(b, p);
+ do_something();
+ } else {
+ WRITE_ONCE(b, r);
+ do_something_else();
+ }
+
+ëì MAX ê 1 ë ììë ììëë, ìíìëë (q % MAX) ë 0ìë êì ìììê,
+ìì ìëë ìëì êì ëêëë ì ììëë:
+
+ q = READ_ONCE(a);
+ WRITE_ONCE(b, p);
+ do_something_else();
+
+ìëê ëë, CPU ë ëì 'a' ëëíì ëëì ëì 'b' ëì ìíì ììì ììë
+ììì íìê ìììëë. barrier() ë ìêí íêí ëê ìêìë, êê
+ëìì ìëëë. ìê êêë ìëìê, barrier() ë ìë ëëëì ëíëë.
+ëëì, ì ììë ììì íëë, MAX ê 1 ëë íëë êì, ëìê êì ëëì
+ììí ëëí íì íëë:
+
+ q = READ_ONCE(a);
+ BUILD_BUG_ON(MAX <= 1); /* Order load from a with store to b. */
+ if (q % MAX) {
+ WRITE_ONCE(b, p);
+ do_something();
+ } else {
+ WRITE_ONCE(b, r);
+ do_something_else();
+ }
+
+'b' ëì ìíìëì ììí ìë ëëì ììëìì. ëì êêëì ëìíë,
+ììì ììêíë, ìíìëê ê ìíì ìíëììëì 'if' ë ëêìë
+ëììë ì ììëë.
+
+ëí ìì ìêë íêì ëë ììíì ìëë ììíì íëë. ëìì ìë
+ëìë:
+
+ q = READ_ONCE(a);
+ if (q || 1 > 0)
+ WRITE_ONCE(b, 1);
+
+ìëì ìêëìëë ëëì ìê ììë êììë ëë ì ìê ëëì ìêì íì
+ììê ëëì, ìíìëë ì ìë ëìê êì ëêì ìíë ìììì ììëë
+ì ììëë:
+
+ q = READ_ONCE(a);
+ WRITE_ONCE(b, 1);
+
+ì ìë ìíìëê ìëë ìììë ììí ì ìëë ëëí íì íëë ìì
+êìíëë. ìê ë ìëììë ëíì, READ_ONCE() ë ìíìëìê ììì ëë
+ìíëììì ìí ìëë ìëë ëëëë íìë, ìíìëê êëê ëëìì
+ìëì ìí êêë ììíëë êìíìë ììëë.
+
+ëìëìë, ìíë ìììì ìíì (transitivity) ì ìêíì -ììëë-. ìê
+x ì y ê ë ë 0 ìëë ìêêì êìëë êì íì ëêì ììë
+ëìêìëë:
+
+ CPU 0 CPU 1
+ ======================= =======================
+ r1 = READ_ONCE(x); r2 = READ_ONCE(y);
+ if (r1 > 0) if (r2 > 0)
+ WRITE_ONCE(y, 1); WRITE_ONCE(x, 1);
+
+ assert(!(r1 == 1 && r2 == 1));
+
+ì ë CPU ìììì assert() ì ìêì íì ìì êìëë. êëê, ëì ìíë
+ìììì ìíìì (ììëë êëì ììë) ëìíëë, ëìì CPU ê ìêëìë
+ìëì assert() ìêì ìì ëêìëë:
+
+ CPU 2
+ =====================
+ WRITE_ONCE(x, 2);
+
+ assert(!(r1 == 2 && r2 == 1 && x == 2)); /* FAILS!!! */
+
+íìë ìíë ìììì ìíìì ìêíì -ìê- ëëì, ìêì CPU ììê ìí
+ìëë íì ìì assert() ì ìêì êììë íêë ì ììëë. ìêì CPU
+ììê ììë ìíê ìíëë, CPU 0 ì CPU 1 ìëì ëëì ìíì ìì, "if"
+ë ëë ëìì smp_mb()ë ëìì íëë. ë ëìêì, ììì ë CPU ììë
+ëì ìííëë ììëì ììì íëë.
+
+ì ëêì ììë ëì ëë:
+http://www.cl.cam.ac.uk/users/pes20/ppc-supplemental/test6.pdf ì
+ì ììí: https://www.cl.cam.ac.uk/~pes20/ppcmem/index.html ì ëì LB ì WWC
+ëíëì íìíìëë.
+
+ììíìë:
+
+ (*) ìíë ìììì ìì ëëëì ëì ìíìëì ëí ììë ëììëë.
+ íìë, ê ìì ìë ììë ëìíì -ììëë-: ìì ëëì ëì ëëë
+ ìììë, ìì ìíìì ëì ìíìë ìììëì. ìë ëë ííì
+ ììê íìíëë smp_rmb() ë smp_wmb()ë, ëë, ìì ìíìëê ëì
+ ëëë ììì ììë ìíìë smp_mb() ë ììíìì.
+
+ (*) "if" ëì ìêë ëëìê êì ëììì ëìí ìíìë ììíëë, ê
+ ìíìëì ê ìíì ìì smp_mb() ë ëêë smp_store_release() ë
+ ììíì ìíìë íë ììë ììë ëììì íëë. ì ëìë íêíê
+ ìí "if" ëì ìêë ëëìì ìì ììì barrier() ë ëë êëìëë
+ ìëí íêì ëì ìëë, ìë ìì ììì ëêê êì, ìíìëì
+ ììíë barrier() ê ìëíë ëë ìíëìë ìíë ìììì ìììí
+ ì ìê ëëìëë ìì ëë ììëìê ëëëë.
+
+ (*) ìíë ìììì ìì ëëì ëì ìíì ììì ìì íëì, ìí
+ ììììì ìêêêë íìë íë, ì ìêêêë ìì ëëì êêëìì
+ íëë. ëì ìíìëê ìê êêë ììíë ììì ìëë, ììë
+ ììíë ììëëì êëë. READ_ONCE() ì WRITE_ONCE() ì ìì êì
+ ììì ììì ìê êêë ììíëë ëìì ë ì ììëë.
+
+ (*) ìíë ìììì ìíì ìíìëê ìêêêë ììëëë êì ëìì
+ íëë. ìì êì READ_ONCE() ë atomic{,64}_read() ì ììì ìíë
+ ìììì ìëìì ìê íëë ëìì ì ì ììëë. ë ëì ìëë
+ ìíì "ìíìë ëëì" ììì ìêíìê ëëëë.
+
+ (*) ìíë ìììì ëí ëë íìì ëëìëê ìì ëì ììëëë.
+
+ (*) ìíë ìììì ìíìì ìêíì -ììëë-. ìíìì íìíëë,
+ smp_mb() ë ììíìì.
+
+
+SMP ëëì ìëìê
+--------------------
+
+CPU ê ìíììì ëë ëì ìë íìì ëëë ëëìë íì ìì ëì
+ììëìì íëë. ììíê ìì ëìì ìì ìëë ììì ìëì êêìëë.
+
+ëì ëëìëì ëì ëëìëëë ìì ëììë ìíìì ìë ëëëì ëë
+íìì ëëìëêë ìì ëìëë. ACQUIRE ëëìë RELEASE ëëìì ìì
+ëìëëë, ë ë ëì ëëìë ííí ëë ëëìëêë ìì ëì ì ììëë.
+ìê ëëìë ëìí ììì ëëìë ìíë ììì, ACQUIRE ëëì, RELEASE
+ëëì, ìê ëëì, ëë ëì ëëìì ìì ëìëë. ëìíê ìê ëëìë
+ìíë ììì, ëë ëìí ììì ëëìë ìê ëëìë ACQUIRE ëëì,
+RELEASE ëëì, ëë ëì ëëìì ìì ëìëë, ëìê êìëë:
+
+ CPU 1 CPU 2
+ =============== ===============
+ WRITE_ONCE(a, 1);
+ <ìê ëëì>
+ WRITE_ONCE(b, 2); x = READ_ONCE(b);
+ <ìê ëëì>
+ y = READ_ONCE(a);
+
+ëë:
+
+ CPU 1 CPU 2
+ =============== ===============================
+ a = 1;
+ <ìê ëëì>
+ WRITE_ONCE(b, &a); x = READ_ONCE(b);
+ <ëìí ììì ëëì>
+ y = *x;
+
+ëë:
+
+ CPU 1 CPU 2
+ =============== ===============================
+ r1 = READ_ONCE(y);
+ <ëì ëëì>
+ WRITE_ONCE(y, 1); if (r2 = READ_ONCE(x)) {
+ <ëìì ìíë ììì>
+ WRITE_ONCE(y, 1);
+ }
+
+ assert(r1 == 0 || r2 == 0);
+
+êëììë, ìêìì ìê ëëìë "ë ìíë" íìì ì ììë íì ììíì
+íëë.
+
+[!] ìê ëëì ìì ìíì ìíëììì ìëììë ìê ëëìë ëìí
+ììì ëëì ëì ëë ìíëììê ëìë êìê, ëëë ëìêììëë:
+
+ CPU 1 CPU 2
+ =================== ===================
+ WRITE_ONCE(a, 1); }---- --->{ v = READ_ONCE(c);
+ WRITE_ONCE(b, 2); } \ / { w = READ_ONCE(d);
+ <ìê ëëì> \ <ìê ëëì>
+ WRITE_ONCE(c, 3); } / \ { x = READ_ONCE(a);
+ WRITE_ONCE(d, 4); }---- --->{ y = READ_ONCE(b);
+
+
+ëëë ëëì ìíìì ì
+-------------------------
+
+ìì, ìê ëëìë ìíì ìíëììëì ëëì ìì ììêë ëìíëë.
+ìëì ìëí ìíìë ëìì:
+
+ CPU 1
+ =======================
+ STORE A = 1
+ STORE B = 2
+ STORE C = 3
+ <ìê ëëì>
+ STORE D = 4
+ STORE E = 5
+
+ì ìëí ìíìë ëëë ìêì ììíì ììëëì ììê ììíì ìë ìí
+{ STORE A, STORE B, STORE C } ê ìì ììëëì ììê ììíì ìë ìí
+{ STORE D, STORE E } ëë ëì ììë êìë ììíì ëëì ììëì ëìëë
+ìëëëë:
+
+ +-------+ : :
+ | | +------+
+ | |------>| C=3 | } /\
+ | | : +------+ }----- \ -----> ììíì ëëì ììì
+ | | : | A=1 | } \/ ëìì ì ìë ìëíë
+ | | : +------+ }
+ | CPU 1 | : | B=2 | }
+ | | +------+ }
+ | | wwwwwwwwwwwwwwww } <--- ìêì ìê ëëìë ëëì ìì
+ | | +------+ } ëë ìíìê ëëì ëì ìíì
+ | | : | E=5 | } ìì ëëë ììíì ìëëëë
+ | | : +------+ } íëë
+ | |------>| D=4 | }
+ | | +------+
+ +-------+ : :
+ |
+ | CPU 1 ì ìí ëëë ììíì ìëëë
+ | ìëì ìíì ìíëììë
+ V
+
+
+ëì, ëìí ììì ëëìë ëìí ììì ëë ìíëììëì ëëì ìì
+ììêë ëìíëë. ëì ìëì ìëíëì ëìì:
+
+ CPU 1 CPU 2
+ ======================= =======================
+ { B = 7; X = 9; Y = 8; C = &Y }
+ STORE A = 1
+ STORE B = 2
+ <ìê ëëì>
+ STORE C = &B LOAD X
+ STORE D = 4 LOAD C (gets &B)
+ LOAD *C (reads B)
+
+ìêì ëëë êìì ìëë, CPU 1 ì ìê ëëììë ëêíê CPU 2 ë CPU 1
+ì ìëíëì ììí ëììì ììë ììíê ëëë:
+
+ +-------+ : : : :
+ | | +------+ +-------+ | CPU 2 ì ììëë
+ | |------>| B=2 |----- --->| Y->8 | | ìëìí ìëí
+ | | : +------+ \ +-------+ | ìíì
+ | CPU 1 | : | A=1 | \ --->| C->&Y | V
+ | | +------+ | +-------+
+ | | wwwwwwwwwwwwwwww | : :
+ | | +------+ | : :
+ | | : | C=&B |--- | : : +-------+
+ | | : +------+ \ | +-------+ | |
+ | |------>| D=4 | ----------->| C->&B |------>| |
+ | | +------+ | +-------+ | |
+ +-------+ : : | : : | |
+ | : : | |
+ | : : | CPU 2 |
+ | +-------+ | |
+ ëëí ìëë ---> | | B->7 |------>| |
+ B ì ê ìì (!) | +-------+ | |
+ | : : | |
+ | +-------+ | |
+ X ì ëëê B ì ---> \ | X->9 |------>| |
+ ìêì ììë \ +-------+ | |
+ ìììí ----->| B->2 | +-------+
+ +-------+
+ : :
+
+
+ìì ììì, CPU 2 ë (B ì êì ë) *C ì ê ìêê C ì LOAD ëì ììììë
+B ê 7 ìëë êêë ììëë.
+
+íìë, ëì ëìí ììì ëëìê C ì ëëì *C (ì, B) ì ëë ììì
+ììëë:
+
+ CPU 1 CPU 2
+ ======================= =======================
+ { B = 7; X = 9; Y = 8; C = &Y }
+ STORE A = 1
+ STORE B = 2
+ <ìê ëëì>
+ STORE C = &B LOAD X
+ STORE D = 4 LOAD C (gets &B)
+ <ëìí ììì ëëì>
+ LOAD *C (reads B)
+
+ëìê êì ëëë:
+
+ +-------+ : : : :
+ | | +------+ +-------+
+ | |------>| B=2 |----- --->| Y->8 |
+ | | : +------+ \ +-------+
+ | CPU 1 | : | A=1 | \ --->| C->&Y |
+ | | +------+ | +-------+
+ | | wwwwwwwwwwwwwwww | : :
+ | | +------+ | : :
+ | | : | C=&B |--- | : : +-------+
+ | | : +------+ \ | +-------+ | |
+ | |------>| D=4 | ----------->| C->&B |------>| |
+ | | +------+ | +-------+ | |
+ +-------+ : : | : : | |
+ | : : | |
+ | : : | CPU 2 |
+ | +-------+ | |
+ | | X->9 |------>| |
+ | +-------+ | |
+ C ëì ìíì ìì ---> \ ddddddddddddddddd | |
+ ëë ìëí êêê \ +-------+ | |
+ ëì ëëìê ----->| B->2 |------>| |
+ ëìê êìíë +-------+ | |
+ : : +-------+
+
+
+ìì, ìê ëëìë ëë ìíëììëìì ëëì ìì ììêë ëìíëë.
+ìëì ìëì ìëíë ëìë:
+
+ CPU 1 CPU 2
+ ======================= =======================
+ { A = 0, B = 9 }
+ STORE A=1
+ <ìê ëëì>
+ STORE B=2
+ LOAD B
+ LOAD A
+
+CPU 1 ì ìê ëëìë ììë, ëëë êìì ìëë CPU 2 ë CPU 1 ìì ííì
+ìëíì êêë ëììì ììë ììíê ëëë.
+
+ +-------+ : : : :
+ | | +------+ +-------+
+ | |------>| A=1 |------ --->| A->0 |
+ | | +------+ \ +-------+
+ | CPU 1 | wwwwwwwwwwwwwwww \ --->| B->9 |
+ | | +------+ | +-------+
+ | |------>| B=2 |--- | : :
+ | | +------+ \ | : : +-------+
+ +-------+ : : \ | +-------+ | |
+ ---------->| B->2 |------>| |
+ | +-------+ | CPU 2 |
+ | | A->0 |------>| |
+ | +-------+ | |
+ | : : +-------+
+ \ : :
+ \ +-------+
+ ---->| A->1 |
+ +-------+
+ : :
+
+
+íìë, ëì ìê ëëìê B ì ëëì A ì ëë ììì ììíëë:
+
+ CPU 1 CPU 2
+ ======================= =======================
+ { A = 0, B = 9 }
+ STORE A=1
+ <ìê ëëì>
+ STORE B=2
+ LOAD B
+ <ìê ëëì>
+ LOAD A
+
+CPU 1 ì ìí ëëìì ëëì ììê CPU 2 ìë êëë ììëëë:
+
+ +-------+ : : : :
+ | | +------+ +-------+
+ | |------>| A=1 |------ --->| A->0 |
+ | | +------+ \ +-------+
+ | CPU 1 | wwwwwwwwwwwwwwww \ --->| B->9 |
+ | | +------+ | +-------+
+ | |------>| B=2 |--- | : :
+ | | +------+ \ | : : +-------+
+ +-------+ : : \ | +-------+ | |
+ ---------->| B->2 |------>| |
+ | +-------+ | CPU 2 |
+ | : : | |
+ | : : | |
+ ìêì ìê ëëìë ----> \ rrrrrrrrrrrrrrrrr | |
+ B ëì ìíì ìì \ +-------+ | |
+ ëë êêë CPU 2 ì ---->| A->1 |------>| |
+ ëìëë íë +-------+ | |
+ : : +-------+
+
+
+ë ìëí ìëì ìí, A ì ëëê ìê ëëì ìê ëì ììë ìëê ëì
+ìêí ëìë:
+
+ CPU 1 CPU 2
+ ======================= =======================
+ { A = 0, B = 9 }
+ STORE A=1
+ <ìê ëëì>
+ STORE B=2
+ LOAD B
+ LOAD A [first load of A]
+ <ìê ëëì>
+ LOAD A [second load of A]
+
+A ì ëë ëêê ëë B ì ëë ëì ììë, ìë ëë êì ììì ì
+ììëë:
+
+ +-------+ : : : :
+ | | +------+ +-------+
+ | |------>| A=1 |------ --->| A->0 |
+ | | +------+ \ +-------+
+ | CPU 1 | wwwwwwwwwwwwwwww \ --->| B->9 |
+ | | +------+ | +-------+
+ | |------>| B=2 |--- | : :
+ | | +------+ \ | : : +-------+
+ +-------+ : : \ | +-------+ | |
+ ---------->| B->2 |------>| |
+ | +-------+ | CPU 2 |
+ | : : | |
+ | : : | |
+ | +-------+ | |
+ | | A->0 |------>| 1st |
+ | +-------+ | |
+ ìêì ìê ëëìë ----> \ rrrrrrrrrrrrrrrrr | |
+ B ëì ìíì ìì \ +-------+ | |
+ ëë êêë CPU 2 ì ---->| A->1 |------>| 2nd |
+ ëìëë íë +-------+ | |
+ : : +-------+
+
+
+íìë CPU 1 ììì A ìëìíë ìê ëëìê ìëëê ììë ëì ìë
+ìê íëë:
+
+ +-------+ : : : :
+ | | +------+ +-------+
+ | |------>| A=1 |------ --->| A->0 |
+ | | +------+ \ +-------+
+ | CPU 1 | wwwwwwwwwwwwwwww \ --->| B->9 |
+ | | +------+ | +-------+
+ | |------>| B=2 |--- | : :
+ | | +------+ \ | : : +-------+
+ +-------+ : : \ | +-------+ | |
+ ---------->| B->2 |------>| |
+ | +-------+ | CPU 2 |
+ | : : | |
+ \ : : | |
+ \ +-------+ | |
+ ---->| A->1 |------>| 1st |
+ +-------+ | |
+ rrrrrrrrrrrrrrrrr | |
+ +-------+ | |
+ | A->1 |------>| 2nd |
+ +-------+ | |
+ : : +-------+
+
+
+ìêì ëìëë ê, ëì B ì ëëê B == 2 ëë êêë ëëë, A ìì ëëì
+ëëë íì A == 1 ì ëê ë êìëë êëë. A ìì ìëì ëëìë êë
+ëìì ììëë; A == 0 ìêë A == 1 ìêë ë ì íëì êêë ëê ëêëë.
+
+
+ìê ëëë ëëì VS ëë ìì
+-------------------------------
+
+ëì CPUëì ëëë ììììë (speculatively) íëë: ìë ëìíë ëëëìì
+ëëíì íê ëì ììì íëë, íë ëìíë ëëíë ììíëìì ììëë
+ìì ëëì ììëëë ëë ëë ììì ìì ëì (bus) ê ìë ìë íê ìì
+ìëë, ê ëìíë ëëíëë. ìíì ìì ëë ììíëìì ìíëë CPU ê
+ìë ê êì êìê ìê ëëì ê ëë ììíëìì ìì ìëëëë.
+
+íë CPU ë ììëë ê êì íìì ììëë ììì ëìì ëëë ìë ìëë -
+íë ëë ììíëìì ëëìë ìíëêë íì ì ìêì - , êëê ëë ìì
+ììë êì ëëêë ëìì ììì ìí ììì ëìë ì ììëë.
+
+ëìì ìêí ëìë:
+
+ CPU 1 CPU 2
+ ======================= =======================
+ LOAD B
+ DIVIDE } ëëê ëëì ìëììë
+ DIVIDE } ê ìêì íìë íëë
+ LOAD A
+
+ë ìëê ë ì ììëë:
+
+ : : +-------+
+ +-------+ | |
+ --->| B->2 |------>| |
+ +-------+ | CPU 2 |
+ : :DIVIDE | |
+ +-------+ | |
+ ëëê íëë ëì ---> --->| A->0 |~~~~ | |
+ CPU ë A ì LOAD ë +-------+ ~ | |
+ ììíì ìííë : : ~ | |
+ : :DIVIDE | |
+ : : ~ | |
+ ëëêê ëëë ---> ---> : : ~-->| |
+ CPU ë íë LOAD ë : : | |
+ ìê ìëíë : : +-------+
+
+
+ìê ëëìë ëìí ììì ëëìë ëëì ëë ììì ëëëë:
+
+ CPU 1 CPU 2
+ ======================= =======================
+ LOAD B
+ DIVIDE
+ DIVIDE
+ <ìê ëëì>
+ LOAD A
+
+ìììë ììì êì ììë ëëìì íìì ëëì íë êì ììì êíëê
+ëëë. ëì íë ëëë ììì ëíê ììëë, ìììë ììëìë êì
+ììëëë:
+
+ : : +-------+
+ +-------+ | |
+ --->| B->2 |------>| |
+ +-------+ | CPU 2 |
+ : :DIVIDE | |
+ +-------+ | |
+ ëëê íëë ëì ---> --->| A->0 |~~~~ | |
+ CPU ë A ì LOAD ë +-------+ ~ | |
+ ììíë : : ~ | |
+ : :DIVIDE | |
+ : : ~ | |
+ : : ~ | |
+ rrrrrrrrrrrrrrrr~ | |
+ : : ~ | |
+ : : ~-->| |
+ : : | |
+ : : +-------+
+
+
+íìë ëë CPU ìì ìëìíë ëííê ììëë, ê ììì ëííëê ê êì
+ëì ìíìëë:
+
+ : : +-------+
+ +-------+ | |
+ --->| B->2 |------>| |
+ +-------+ | CPU 2 |
+ : :DIVIDE | |
+ +-------+ | |
+ ëëê íëë ëì ---> --->| A->0 |~~~~ | |
+ CPU ë A ì LOAD ë +-------+ ~ | |
+ ììíë : : ~ | |
+ : :DIVIDE | |
+ : : ~ | |
+ : : ~ | |
+ rrrrrrrrrrrrrrrrr | |
+ +-------+ | |
+ ììì ëìì ëíí ëê ---> --->| A->1 |------>| |
+ ìëìíë êì ëì ìíìë +-------+ | |
+ : : +-------+
+
+
+ìíì
+------
+
+ìíì(transitivity)ì ììì ìíí ììíìì íì ìêëìë ìë, ìì
+ëìêì ëí ìëí ìêìì êëìëë. ëìì ìê ìíìì ëììëë:
+
+ CPU 1 CPU 2 CPU 3
+ ======================= ======================= =======================
+ { X = 0, Y = 0 }
+ STORE X=1 LOAD X STORE Y=1
+ <ëì ëëì> <ëì ëëì>
+ LOAD Y LOAD X
+
+CPU 2 ì X ëëê 1ì ëííê Y ëëê 0ì ëííëê íëìë. ìë CPU 2 ì
+X ëëê CPU 1 ì X ìíì ëì ìëììê CPU 2 ì Y ëëë CPU 3 ì Y ìíì
+ìì ìëìììì ìëíëë. êë "CPU 3 ì X ëëë 0ì ëíí ì ìëì?"
+
+CPU 2 ì X ëëë CPU 1 ì ìíì íì ìëìììë, CPU 3 ì X ëëë 1ì
+ëííëê ìììëìëë. ìë ìêì ìíìì í ììëë: CPU A ìì ìíë
+ëëê CPU B ììì êì ëìì ëí ëëë ëëëëë, CPU A ì ëëë CPU B
+ì ëëê ëëì êê êêë ê íì êì ëëìì íëë.
+
+ëëì ìëìì ëì ëëìì ììì ìíìì ëìíëë. ëëì, ìì ììì
+CPU 2 ì X ëëê 1ì, Y ëëë 0ì ëííëë, CPU 3 ì X ëëë ëëì 1ì
+ëííëë.
+
+íìë, ìêë ìê ëëìì ëíìë ìíìì ëìëì -ììëë-. ìë ëì,
+ìì ììì CPU 2 ì ëì ëëìê ìëìë ìê ëëìë ëë êìë ìêí
+ëìë:
+
+ CPU 1 CPU 2 CPU 3
+ ======================= ======================= =======================
+ { X = 0, Y = 0 }
+ STORE X=1 LOAD X STORE Y=1
+ <ìê ëëì> <ëì ëëì>
+ LOAD Y LOAD X
+
+ì ìëë ìíìì êì ììëë: ì ìììë, CPU 2 ì X ëëê 1ì
+ëííê, Y ëëë 0ì ëííìë CPU 3 ì X ëëê 0ì ëííë êë ììí
+íëììëë.
+
+CPU 2 ì ìê ëëìê ììì ìêë ììë ëììë, CPU 1 ì ìíììì
+ììë ëììëêë ëìí ì ìëëê íììëë. ëëì, CPU 1 ê CPU 2 ê
+ëíë ììë êìíë ììíìì ì ìì ìëê ìíëëë, CPU 2 ë CPU 1 ì
+ì êì ì ëë ìêí ì ìì êìëë. ëëì CPU 1 ê CPU 2 ì ìêìë
+ìíë ììë ëë CPU ê ëìí ì ìëë íê ìí ëì ëëìê íìíëë.
+
+ëì ëëìë "êëë ìíì"ì ìêíì, ëë CPU ëì ìíëììëì ììì
+ëìíê í êìëë. ëë, release-acquire ìíì "ëì ìíì" ëì
+ìêíì, íë ìíì ììë CPU ëëì íë ìììëì ìíë ììì ëìíì
+ëìëëë. ìë ëì, ìêìë Herman Hollerith ì C ìëë ëë:
+
+ int u, v, x, y, z;
+
+ void cpu0(void)
+ {
+ r0 = smp_load_acquire(&x);
+ WRITE_ONCE(u, 1);
+ smp_store_release(&y, 1);
+ }
+
+ void cpu1(void)
+ {
+ r1 = smp_load_acquire(&y);
+ r4 = READ_ONCE(v);
+ r5 = READ_ONCE(u);
+ smp_store_release(&z, 1);
+ }
+
+ void cpu2(void)
+ {
+ r2 = smp_load_acquire(&z);
+ smp_store_release(&x, 1);
+ }
+
+ void cpu3(void)
+ {
+ WRITE_ONCE(v, 1);
+ smp_mb();
+ r3 = READ_ONCE(u);
+ }
+
+cpu0(), cpu1(), êëê cpu2() ë smp_store_release()/smp_load_acquire() ìì
+ìêì íí ëì ìíìì ëìíê ììëë, ëìê êì êêë ëìì ìì
+êëë:
+
+ r0 == 1 && r1 == 1 && r2 == 1
+
+ë ëìêì, cpu0() ì cpu1() ììì release-acquire êêë ìí, cpu1() ì
+cpu0() ì ìêë ëìë íëë, ëìê êì êêë ìì êëë:
+
+ r1 == 1 && r5 == 0
+
+íìë, release-acquire íëìì ëìí CPU ëìë ììëëë cpu3() ìë
+ììëì ììëë. ëëì, ëìê êì êêê êëíëë:
+
+ r0 == 0 && r1 == 1 && r2 == 1 && r3 == 0 && r4 == 0
+
+ëìíê, ëìê êì êêë êëíëë:
+
+ r0 == 0 && r1 == 1 && r2 == 1 && r3 == 0 && r4 == 0 && r5 == 1
+
+cpu0(), cpu1(), êëê cpu2() ë êëì ìêì ìêë ììëë ëê ëìë,
+release-acquire ììì êìëì ìì CPU ëì ê ììì ìêì êì ì
+ììëë. ìë ìêì smp_load_acquire() ì smp_store_release() ì êíì
+ììëë ìíë ëëë ëëì ììíëìëì íì ëëì ìì ìíìëì ëì
+ëëëì ììì íìë ìëë ìììì êìíëë. ì ëì cpu3() ë cpu0() ì
+u ëì ìíìë cpu1() ì v ëëíì ëë ëì ììë êìë ë ì ìëë
+ëìëë, cpu0() ì cpu1() ì ì ë ìíëììì ìëë ììëë ììëìì
+ëë ëìíëëë ëìëë.
+
+íìë, smp_load_acquire() ë ëìì ìëì ëìíìê ëëëë. êìììë,
+ì íìë ëìí ìì êìì ìíë ììëëíì ìêë ìííëë. ìêì
+ìë íìí êì ìí êììë ëìíì -ììëë-. ëëì, ëìê êì êêë
+êëíëë:
+
+ r0 == 0 && r1 == 0 && r2 == 0 && r5 == 0
+
+ìë êêë ìë êë ìëì ëì ìë, ììì ìêìì êì êìì
+ììíììë ììë ì ììì êìí ëìê ëëëë.
+
+ëì ëíìë, ëìì ìëê êëë ìíìì íìë íëë, ëì ëëìë
+ììíììì.
+
+
+==================
+ëìì ìë ëëì
+==================
+
+ëëì ìëì ìë ëë ëêìì ëìíë ëìí ëëìëì êìê ììëë:
+
+ (*) ìíìë ëëì.
+
+ (*) CPU ëëë ëëì.
+
+ (*) MMIO ìê ëëì.
+
+
+ìíìë ëëì
+---------------
+
+ëëì ìëì ìíìëê ëëë ìììë ìëì íë êì ëììë ëììì
+ìíìë ëëìë êìê ììëë:
+
+ barrier();
+
+ìê ëì ëëììëë -- barrier() ì ìê-ìê ë ìê-ìê ëìì ììëë.
+íìë, READ_ONCE() ì WRITE_ONCE() ë íì ìììëì ëíìë ëìíë
+barrier() ì ìíë ííë ë ì ììëë.
+
+barrier() íìë ëìê êì íêë êìëë:
+
+ (*) ìíìëê barrier() ëì ìììëì barrier() ìì ìììëë ììë
+ ìëìëì ëíê íëë. ìë ëì, ìíëí íëë ìëì ìíëí ëí
+ ìë ììì íìì ììí íê ìí ììë ì ììëë.
+
+ (*) ëíìì, ìíìëê ëí ìêì ììë ëìë ë ìíëììëë
+ ëëëìì ëëíì ììë ëëë ììí íëê ëìíëë.
+
+READ_ONCE() ì WRITE_ONCE() íìë ìê ìëë ìëììë ëì ììë ëììì
+ìë ìëììë ëìê ë ì ìë ëë ììíë ëìëë. ìë ëì ììíì
+ëí ìë ëêì ëìëë ëìê êìëë:
+
+ (*) ìíìëë êì ëìì ëí ëëì ìíìë ìëì í ì ìê, ìë
+ êììë CPUê êì ëìëëíì ëëëì ìëìí ìë ììëë. ìë
+ ëìì ìëê:
+
+ a[0] = x;
+ a[1] = x;
+
+ x ì ìì êì a[1] ì, ì êì a[0] ì ìê í ì ìëë ëìëë.
+ ìíìëì CPUê ìë ìì ëíê íëë ëìê êì íì íëë:
+
+ a[0] = READ_ONCE(x);
+ a[1] = READ_ONCE(x);
+
+ ì, READ_ONCE() ì WRITE_ONCE() ë ìë CPU ìì íëì ëìì êíìë
+ ìììëì ìì ìêìì ìêíëë.
+
+ (*) ìíìëë êì ëìì ëí ìììì ëëëì ëíí ì ììëë. êë
+ ëí ìììë ìíìëë ëìì ìëë:
+
+ while (tmp = a)
+ do_something_with(tmp);
+
+ ëìê êì, ìê ìëë ìëììë ëì ëìë êëìì ìëì ìí ëì
+ ìë ëíìë "ììí" í ì ììëë:
+
+ if (tmp = a)
+ for (;;)
+ do_something_with(tmp);
+
+ ìíìëê ìë ìì íì ëíê íëë READ_ONCE() ë ììíìì:
+
+ while (tmp = READ_ONCE(a))
+ do_something_with(tmp);
+
+ (*) ììë ëììí ììëì ëì ìíìëê ëë ëìíë ëììíì ëì ì
+ ìë êì, ìíìëë ëìë ëì ëëí ì ììëë. ëëì ìíìëë
+ ìì ììì ëì 'tmp' ììì ììíë ììëë ì ììëë:
+
+ while (tmp = a)
+ do_something_with(tmp);
+
+ ì ìëë ëìê êì ìê ìëëììë ìëíìë ëììì ììíë
+ êìì ìëìì ìëë ëë ì ììëë:
+
+ while (a)
+ do_something_with(a);
+
+ ìë ëì, ììíë ì ìëë ëì a ê ëë CPU ì ìí "while" ëê
+ do_something_with() íì ììì ëëì do_something_with() ì 0ì ëê
+ ìë ììëë.
+
+ ìëìë, ìíìëê êë ìì íëê ëê ìí READ_ONCE() ë ììíìì:
+
+ while (tmp = READ_ONCE(a))
+ do_something_with(tmp);
+
+ ëììíê ëìí ìíì êë êì, ìíìëë tmp ë ìíì ììíë ìë
+ ììëë. ìíìëê ëìë ëì ììëìëê ìëê ììíëê íì ëì
+ ììëìëë ëë ìëíë ëëìëë. êëê íëê ìê ìëë
+ ìëììë ììíëë, ììíì ìì êììë ìíìëìê ìì ìëìì
+ íëë.
+
+ (*) ìíìëë ê êì ëììì ìê ìëë ëëë ìì ìí ìë ììëë.
+ ìë ëì, ëìì ìëë ëì 'a' ì êì íì 0ìì ìëí ì ìëë:
+
+ while (tmp = a)
+ do_something_with(tmp);
+
+ ìëê ììí ëìëë ì ììëë:
+
+ do { } while (0);
+
+ ì ëíì ìê ìëë ìëììë ëìì ëëë ëëì ëëìë ìêíê
+ ëëìëë. ëìë ìíìëê 'a' ì êì ìëìí íëê íìì CPU íë
+ ëìëë êì ììì ìëì íëëë ììëë. ëì ëì 'a' ê êìëì
+ ìëë, ìíìëì ìëì íë êì ëêëë. ìíìëë ê ììì
+ ìêíë êëí ëì êì ìê ìì ëíì ìíìëìê ìëê ìí
+ READ_ONCE() ë ììíìì:
+
+ while (tmp = READ_ONCE(a))
+ do_something_with(tmp);
+
+ íìë ìíìëë READ_ONCE() ëì ëìë êì ëíìë ëêì ëê ììì
+ êìíìì. ìë ëì, ëìì ìëìì MAX ë ììëê ëíëë, 1ì êì
+ êëëê íëìë:
+
+ while ((tmp = READ_ONCE(a)) % MAX)
+ do_something_with(tmp);
+
+ ìëê ëë ìíìëë MAX ë êìê ìíëë "%" ìíëìíì êêê íì
+ 0ìëë êì ìê ëê, ìíìëê ìëë ììììëë ììíì ìë
+ êìë ììí íë êì íìëì ëëëë. ('a' ëìì ëëë ììí
+ ííì êëë.)
+
+ (*) ëìíê, ìíìëë ëìê ììíë íë êì ìë êìê ìëë êì
+ ìë ìíì ììë ìêí ì ììëë. ìëìë, ìíìëë íìì CPU
+ ëì ê ëìì êì ìë ìëì íëì ììëê ìêíì êìë ëìì
+ ëíìë ìëë ìì íê ëëë. ìë ëì, ëìê êì êìê ìì ì
+ ììëë:
+
+ a = 0;
+ ... ëì a ì ìíìë íì ìë ìë ...
+ a = 0;
+
+ ìíìëë ëì 'a' ì êì ìë 0ìëë êì ìê, ëëì ëëì ìíìë
+ ììí êëë. ëì ëë CPU ê ê ìì ëì 'a' ì ëë êì ìëë
+ íëí êêê ëì êëë.
+
+ ìíìëê êë ìëë ììì íì ìëë WRITE_ONCE() ë ììíìì:
+
+ WRITE_ONCE(a, 0);
+ ... ëì a ì ìíìë íì ìë ìë ...
+ WRITE_ONCE(a, 0);
+
+ (*) ìíìëë íì ëëê íì ììë ëëë ìììëì ìëì í ì
+ ììëë. ìë ëì, ëìì íëìì ëë ìëì ìíëí íëë ììì
+ ìíììì ìêí ëìë:
+
+ void process_level(void)
+ {
+ msg = get_message();
+ flag = true;
+ }
+
+ void interrupt_handler(void)
+ {
+ if (flag)
+ process_message(msg);
+ }
+
+ ì ìëìë ìíìëê process_level() ì ëìê êì ëííë êì ëì
+ ìëì ìê, ìë ëíì ìêìëëììëë ììë íëí ìíì ì
+ ììëë:
+
+ void process_level(void)
+ {
+ flag = true;
+ msg = get_message();
+ }
+
+ ì ëêì ëì ììì ìíëíê ëìíëë, interrupt_handler() ë ìëë
+ ì ì ìë ëììë ëì ìë ììëë. ìê ëê ìí ëìê êì
+ WRITE_ONCE() ë ììíìì:
+
+ void process_level(void)
+ {
+ WRITE_ONCE(msg, get_message());
+ WRITE_ONCE(flag, true);
+ }
+
+ void interrupt_handler(void)
+ {
+ if (READ_ONCE(flag))
+ process_message(READ_ONCE(msg));
+ }
+
+ interrupt_handler() ìììë ììë ìíëíë NMI ì êì ìíëí íëë
+ ìì 'flag' ì 'msg' ì ìêíë ëëë ëìêì ìíëí ë ì ìëë
+ READ_ONCE() ì WRITE_ONCE() ë ììíì íì êìí ëìì. ëì êë
+ êëìì ìëë, interrupt_handler() ìììë ëìí ëìì ìëëë
+ READ_ONCE() ì WRITE_ONCE() ë íìì ììëë. (êëì ëëì ìëìì
+ ììë ìíëíë ëí ì ììëì ììë êìí ëìì, ììë, ìë
+ ìíëí íëëê ìíëíê íìíë ìë ëííë WARN_ONCE() ê
+ ìíëëë.)
+
+ ìíìëë READ_ONCE() ì WRITE_ONCE() ëì READ_ONCE() ë WRITE_ONCE(),
+ barrier(), ëë ëìí êëì ëê ìì ìì ìëë ììì ì ìì êìë
+ êìëìì íëë.
+
+ ì íêë barrier() ë ííìë ëë ì ììë, READ_ONCE() ì
+ WRITE_ONCE() ê ì ë ìë ëì ìíìëë: READ_ONCE() ì WRITE_ONCE()ë
+ ìíìëì ììì ëëë ììì ëíìë ììí êëìì íêíëë
+ íìë, barrier() ë ìíìëê ìêêì êêì ëììíì ììí ëì
+ ëë ëëë ììì êì ëëì íê íê ëëìëë. ëë, ìíìëë
+ READ_ONCE() ì WRITE_ONCE() ê ììë ììë ìììëë, CPU ë ëìí
+ ê ììë ìí ìëê ììëì.
+
+ (*) ìíìëë ëìì ìììì êì ëììì ìíìë ëìíë ìë ììëë:
+
+ if (a)
+ b = a;
+ else
+ b = 42;
+
+ ìíìëë ìëì êì ììíë ëëìë ìì êëë:
+
+ b = 42;
+ if (a)
+ b = a;
+
+ ìê ìëë ìëìì ì ììíë ììí ë ìëë ëëì êìë
+ ìììëë. íìë ìíêêë, ëììì ìë ìëììë ì ììíë ëë
+ CPU ê 'b' ë ëëí ë, -- 'a' ê 0ì ìëëë -- êìì ê, 42ë ëê
+ ëë êìë êëíê íëë. ìê ëìíê ìí WRITE_ONCE() ë
+ ììíìì:
+
+ if (a)
+ WRITE_ONCE(b, a);
+ else
+ WRITE_ONCE(b, 42);
+
+ ìíìëë ëëë ëëìë ìë ììëë. ìëììëë ëìë ììíì
+ ììë, ìì ëì ëììì ììì ìëê íììì ëìëë ì ììëë.
+ ëìë ëëë ëê ìíì READ_ONCE() ë ììíìì.
+
+ (*) ìëë ëëë ììì ììí, íëì ëëë ìì ììíëììë ììì
+ êëí íêì ëìíë íëì í ìììê ìëêì ìì ìììëë
+ ëìëë "ëë íìë(load tearing)" ê "ìíì íìë(store tearing)" ì
+ ëìíëë. ìë ëì, ììì ìííìê 7-bit imeediate field ë êë
+ 16-bit ìíì ììíëìì ìêíëë, ìíìëë ëìì 32-bit ìíìë
+ êííëëì ëêì 16-bit store-immediate ëëì ììíë íêëë:
+
+ p = 0x00010002;
+
+ ìíì í ììë ëëê ê êì ìíì íê ìí ëêê ëë ììíëìì
+ ììíê ëë, ìë ìëì ììíë GCC ë ììë íì ëë ìì ëììì.
+ ì ììíë ìê ìëë ìëììë ìêìì ììí ìëë. ììë, êëì
+ ëìí (êëê êìì) ëêë GCC ê volatile ìíìì ëììììë ì
+ ììíë ììíê íìëë. êë ëêê ìëë, ëìì ììì
+ WRITE_ONCE() ì ììì ìíì íìëì ëìíëë:
+
+ WRITE_ONCE(p, 0x00010002);
+
+ Packed êììì ìì ìì ëìì ììë ëë / ìíì íìëì ìëí ì
+ ììëë:
+
+ struct __attribute__((__packed__)) foo {
+ short a;
+ int b;
+ short c;
+ };
+ struct foo foo1, foo2;
+ ...
+
+ foo2.a = foo1.a;
+ foo2.b = foo1.b;
+ foo2.c = foo1.c;
+
+ READ_ONCE() ë WRITE_ONCE() ë ìê volatile ëíë ìê ëëì,
+ ìíìëë ì ìêì ëìëì ëêì 32-bit ëëì ëêì 32-bit ìíìë
+ ëíí ì ììëë. ìë 'foo1.b' ì êì ëë íìëê 'foo2.b' ì
+ ìíì íìëì ìëí êëë. ì ìììë READ_ONCE() ì WRITE_ONCE()
+ ê íìëì ëì ì ììëë:
+
+ foo2.a = foo1.a;
+ WRITE_ONCE(foo2.b, READ_ONCE(foo1.b));
+ foo2.c = foo1.c;
+
+êëìë, volatile ë ëíë ëìì ëíìë READ_ONCE() ì WRITE_ONCE() ê
+íìì ììëë. ìë ëì, 'jiffies' ë volatile ë ëíëì ìê ëëì,
+READ_ONCE(jiffies) ëê í íìê ììëë. READ_ONCE() ì WRITE_ONCE() ê
+ìì volatile ììíìë êíëì ììì ììê ìë volatile ë ëíëì
+ìëë ëëë íêë ëìë ìê ëëìëë.
+
+ì ìíìë ëëìëì CPU ìë ììì íêë ìí ëëì ìê ëëì, êêì
+ìëìê ììë ìë ììì ëë êìí ëììì.
+
+
+CPU ëëë ëëì
+-----------------
+
+ëëì ìëì ëìì ìëê êë CPU ëëë ëëìë êìê ììëë:
+
+ TYPE MANDATORY SMP CONDITIONAL
+ =============== ======================= ===========================
+ ëì mb() smp_mb()
+ ìê wmb() smp_wmb()
+ ìê rmb() smp_rmb()
+ ëìí ììì read_barrier_depends() smp_read_barrier_depends()
+
+
+ëìí ììì ëëìë ììí ëë ëëë ëëìë ìíìë ëëìë
+íííëë. ëìí ìììì ìíìëìì ìêìì ìì ëìì íííì
+ììëë.
+
+ëë: ëìí ìììì ìë êì, ìíìëë íë ëëë ìëë ììë ììí
+êìë (ì: `a[b]` ë a[b] ë ëë íê ìì b ì êì ëì ëëíë)
+êëëìë, C ìì ìììë ìíìëê b ì êì ìì (ì: 1 ê êì) íì
+b ëë ìì a ëëë íë ìë (ì: tmp = a[1]; if (b != 1) tmp = a[b]; ) ë
+ëëì ììì íëë ëì êì ê ììëë. ëí ìíìëë a[b] ë ëëí
+íì b ë ëëì ëëí ìë ììì, a[b] ëë ìì ëìì b êì êì ìë
+ììëë. ìë ëìëì íêìì ëí ìê ììë ìì ììëëë, ìë
+READ_ONCE() ëíëëí ëê ììíëê ìì ììì ëêëë.
+
+SMP ëëë ëëìëì ìëíëììë ìíìë ììíììë ìíìë ëëìë
+ëëëë, íëì CPU ë ììë ìêìì ììíê, êìë ìììë ìì ìëë
+ììë ííì êìë ìêëê ëëìëë. íìë, ìëì "Virtual Machine
+Guests" ìëììì ìêíììì.
+
+[!] SMP ììíìì êìëëëëì ìêëì ìì ììì í ë, SMP ëëë
+ëëìë _ëëì_ ììëìì íì êìíìì, êëì ëì ììíë êìëë
+ìëíê íìë ëìì.
+
+Mandatory ëëìëì SMP ììíììë UP ììíììë SMP íêë íìíêìë
+ëíìí ìëíëë êê ëëì SMP íêë íìíë ëë êìë ììëì ììì
+íëë. íìë, ëìí ìì êìì ëëë I/O ìëìë íí MMIO ì íêë
+íìí ëìë mandatory ëëìëì ììë ì ììëë. ì ëëìëì
+ìíìëì CPU ëë ìëìë ëíëë íìëì ëëë ìíëììëì ëëììì
+ëììë ìììë ìíì ìê ëëì, SMP ê ìë ììíìë íìëë íìí ì
+ììëë.
+
+
+ìë êê ëëì íìëë ììëë:
+
+ (*) smp_store_mb(var, value)
+
+ ì íìë íì ëìì íì êì ëìíê ëì ëëë ëëìë ìëë.
+ UP ìíìììë ìíìë ëëìëë ëí êì ìëêë ëìëì ììëë.
+
+
+ (*) smp_mb__before_atomic();
+ (*) smp_mb__after_atomic();
+
+ ìêëì êì ëííì ìë (ëíê, ëê, ìê, êìì êì) ìíë
+ íìëì ìí, íí êêëì ëíëì ììíì ììë ëë ìí
+ íìëìëë. ì íìëì ëëë ëëìë ëííê ììë ììëë.
+
+ ìêëì êì ëííì ììë ìíëí (set_bit ê clear_bit êì) ëí
+ ìììë ììë ì ììëë.
+
+ í ìë, êì íëë ëíí êìë íìíê ê êìì ëíëì ììíë
+ êììíë ëì ìëë ëìì:
+
+ obj->dead = 1;
+ smp_mb__before_atomic();
+ atomic_dec(&obj->ref_count);
+
+ ì ìëë êìì ìëìíë death ëíê ëíëì ììí êì ëì
+ *ìì* ëì êì ëìíëë.
+
+ ë ëì ìëë ìíì Documentation/atomic_ops.txt ëìë ìêíìì.
+ ìëì ìêëì ììíì íì êêíëë "ìíë ìíëìì" ìëììì
+ ìêíìì.
+
+
+ (*) lockless_dereference();
+
+ ì íìë smp_read_barrier_depends() ëìí ììì ëëìë ììíë
+ íìí ìììê ëí(wrapper) íìë ìêë ì ììëë.
+
+ êìì ëìííìì RCU ìì ëìëììë êëëëë ìì ììíë
+ rcu_dereference() ìë ììíë, ìë ëë êìê ììíì êì ëìë
+ ìêëë êì ëìëë. ëí, lockless_dereference() ì RCU ì íê
+ ììëìë, RCU ìì ììë ìë ìë ìë ëìí êìì ììëê
+ ììëë.
+
+
+ (*) dma_wmb();
+ (*) dma_rmb();
+
+ ìêëì CPU ì DMA êëí ëëìììì ëë ììì êëí êì ëëëì
+ ìê, ìê ììëì ììë ëìíê ìí consistent memory ìì ììíê
+ ìí êëìëë.
+
+ ìë ëì, ëëììì ëëëë êìíë, ëìíëí ìí êì ììí
+ ëìíëíê ëëììì ìí ìëì ìëë CPU ì ìí ìëì íìíê,
+ êìì ììì(doorbell) ì ììí ìëìíë ëìíëíê ëëììì ìì
+ êëíììì êìíë ëëìì ëëìëë ìêí ëìë:
+
+ if (desc->status != DEVICE_OWN) {
+ /* ëìíëíë ììíê ììë ëìíë ìì ìì */
+ dma_rmb();
+
+ /* ëìíë ìê ì */
+ read_data = desc->data;
+ desc->data = write_data;
+
+ /* ìí ìëìí ì ìììíì ëì */
+ dma_wmb();
+
+ /* ììêì ìì */
+ desc->status = DEVICE_OWN;
+
+ /* MMIO ë íí ëëììì êìë íê ìì ëëëë ëêí */
+ wmb();
+
+ /* ìëìíë ëìíëíì ëëììì êì */
+ writel(DESC_NOTIFY, doorbell);
+ }
+
+ dma_rmb() ë ëìíëíëëí ëìíë ìììê ìì ëëììê ììêì
+ ëëììì ëìíê íê, dma_wmb() ë ëëììê ììì ììêì ëì
+ êììì ëê ìì ëìíëíì ëìíê ìììì ëìíëë. wmb() ë
+ ìì ìêìì ìë (cache incoherent) MMIO ììì ìêë ìëíê ìì
+ ìì ìêìì ìë ëëë (cache coherent memory) ìêê ìëëììì
+ ëìíìê ìí íìíëë.
+
+ consistent memory ì ëí ììí ëìì ìíì Documentation/DMA-API.txt
+ ëìë ìêíìì.
+
+
+MMIO ìê ëëì
+----------------
+
+ëëì ìëì ëí memory-mapped I/O ìêë ìí íëí ëëìë êìê
+ììëë:
+
+ mmiowb();
+
+ìêì mandatory ìê ëëìì ëììë, ìíë ìì êìì I/O ììììëì
+ìêê ëëììë ììë ëìëë íìëë. ì íìë CPU->íëìì ììë
+ëìì ìì íëìììêì ìë ììì ìíì ëìëë.
+
+ë ëì ìëë ìíì "Acquire vs I/O ììì" ìëììì ìêíìì.
+
+
+=========================
+ìëì ìë ëëë ëëì
+=========================
+
+ëëì ìëì ìë íìëì ëëë ëëìë ëìíê ìëë, ë(lock)ê
+ìììë êë íìëì ëëëìëë.
+
+ìêì _ììíì_ ëìì ìëíëë; íì ìííìììë ì ìëëë ë ëì
+ëìì ìêí ìë ììëëë íë ìííìì ìììì ìë ìì ëëììë
+êë ëìì êëíì ìëêëë.
+
+
+ë ACQUISITION íì
+-------------------
+
+ëëì ìëì ëìí ë êììë êìê ììëë:
+
+ (*) ìí ë
+ (*) R/W ìí ë
+ (*) ëíì
+ (*) ìëíì
+ (*) R/W ìëíì
+
+ê êììëë ëë êìì "ACQUIRE" ìíëììê "RELEASE" ìíëììì ëìì
+ììíëë. ì ìíëììëì ëë ììí ëëìë ëííê ììëë:
+
+ (1) ACQUIRE ìíëììì ìí:
+
+ ACQUIRE ëìì ììë ëëë ìíëììì ACQUIRE ìíëììì ìëë
+ ëì ìëëëë.
+
+ ACQUIRE ììì ììë ëëë ìíëììì ACQUIRE ìíëììì ìëë íì
+ ìëë ì ììëë. smp_mb__before_spinlock() ëì ACQUIRE ê ìíëë
+ ìë ëëì ëë ìì ìíìë ëë ëì ëëì ìíìì ëí ìì
+ ëìëë. ìê smp_mb() ëë ìíë êìì êìíìì! ëì ìííììì
+ smp_mb__before_spinlock() ì ìì ìëìë íì ììëë.
+
+ (2) RELEASE ìíëììì ìí:
+
+ RELEASE ììì ììë ëëë ìíëììì RELEASE ìíëììì ìëëê
+ ìì ìëëëë.
+
+ RELEASE ëìì ììë ëëë ìíëììì RELEASE ìíëìì ìë ìì
+ ìëë ì ììëë.
+
+ (3) ACQUIRE vs ACQUIRE ìí:
+
+ ìë ACQUIRE ìíëììëë ììì ììë ëë ACQUIRE ìíëììì ê
+ ACQUIRE ìíëìì ìì ìëëëë.
+
+ (4) ACQUIRE vs RELEASE implication:
+
+ ìë RELEASE ìíëììëë ìì ììë ACQUIRE ìíëììì ê RELEASE
+ ìíëììëë ëì ìëëëë.
+
+ (5) ìíí ìêì ACQUIRE ìí:
+
+ ACQUIRE ìíëììì ìë ë(lock) ëìì ëì êëë íëíêìë
+ ëêëí ìíìêë ëì íë êëíìëë êëëë ëì ìêëì ëêë
+ íì ìíí ì ììëë. ìíí ëì ìë ëëìë ëííì ììëë.
+
+[!] ìê: ë ACQUIRE ì RELEASE ê ëëí ëëììì ëíëë íì ì íëë
+íëíì ìì ëêì ììíëìì ìíì íëíì ìì ëëëë ëìì ì
+ìëë êìëë.
+
+RELEASE íì ììëë ACQUIRE ë ìì ëëë ëëìë ìêìë ìëëë,
+ACQUIRE ìì ìììê ACQUIRE íì ìíë ì ìê, RELEASE íì ìììê
+RELEASE ìì ìíë ìë ììë, ê ëêì ìììê ìëë ìëì ìë ìê
+ëëìëë:
+
+ *A = a;
+ ACQUIRE M
+ RELEASE M
+ *B = b;
+
+ë ëìê êì ë ìë ììëë:
+
+ ACQUIRE M, STORE *B, STORE *A, RELEASE M
+
+ACQUIRE ì RELEASE ê ë íëê íìëë, êëê ëì ACQUIRE ì RELEASE ê
+êì ë ëìì ëí êìëë, íë ëì ìê ìì ìì ëë CPU ì ìììë
+ìì êì ìëìê ììëë êìë ëì ì ììëë. ììíìë, ACQUIRE ì
+ìì RELEASE ìíëììì ììììë ìííë íìê ìì ëëë ëëìë
+ìêëìì -ìëëë-.
+
+ëìíê, ìì ëë ìììì RELEASE ì ACQUIRE ëê ìíëììì ììì ìí
+ìì ìì ëëë ëëìë ëííì ììëë. ëëì, RELEASE, ACQUIRE ë
+êìëë íëíì ììì CPU ìíì RELEASE ì ACQUIRE ë êëìë ì ììëë,
+ëìê êì ìëë:
+
+ *A = a;
+ RELEASE M
+ ACQUIRE N
+ *B = b;
+
+ëìê êì ìíë ì ììëë:
+
+ ACQUIRE N, STORE *B, STORE *A, RELEASE M
+
+ìë ìëìë ëëëì ììí ìë ìì êìë ëì ì ììëë. íìë, êë
+ëëëì ììì ìëë RELEASE ë ëìí ìëë êìëë ëëëì ììí ì
+ììëë.
+
+ ìê ìëê ìëë ëìì í ì ììêì?
+
+ ìëê ììê íê ìëê ìëìë íë CPU ì ëí ììêìì,
+ ìíìëì ëí êì ìëë ìì íììëë. ìíìë (ëë, êëì)
+ ê ìíëììëì ìëê ìëìíë, ëëëì ììë ì -ìì-ëë.
+
+ íìë CPU ê ìíëììëì ìëì íëëê ìêí ëìì. ì ììì,
+ ììëë ìë ììëë ìëì ëì ììê ëì ììëë. CPU ê ìë
+ ìëìíì ëì ë ìíëììì ëì ìííê ëëë. ëì ëëëì
+ ììíëë, ì ë ìíëììì êì ìíì íë êìíì ëì
+ ìëíëë (ëë, íì íìêìë, ìëëë). CPU ë ììêë
+ (ììëë ìëììë ëì ììë) ìë ìíëììì ìííëë, ì ìë
+ ìíëììì ììì ëëëì íêíê, ë ìíëììë ëìì ìêíê
+ ëëë.
+
+ íìë ëì ëì ìì ìë íìììëëì? êë êìì ìëë
+ ìììëë ëìêë í êê, ìêì êêì ëëë ëëìë ëëê
+ ëëë, ì ëëë ëëìë ìì ìë ìíëììì ìëëëë ëëê,
+ ëëëì ìëìë íêëëë. ìì ìë íìì ìë ììì êì ìí
+ (race) ë ìì ì ìêìëëë, ë êë êëëì êë êì ìíì ëë
+ êìì ìëë íêí ì ììì íëë.
+
+ëê ìëíìë UP ìíìë ììíììì ììì ëí ëìì íì ìê ëëì,
+êë ìíìì ìíëí ëíìí ìíëììê íêê ìëëë ìë ììë - íí
+I/O ìììì êëíìë - ìëë ììë ì ìì êëë.
+
+"CPU ê ACQUIRING ëëì íê" ììë ìêíìê ëëëë.
+
+
+ìë ëì, ëìê êì ìëë ìêí ëìë:
+
+ *A = a;
+ *B = b;
+ ACQUIRE
+ *C = c;
+ *D = d;
+ RELEASE
+ *E = e;
+ *F = f;
+
+ìêì ëìì ìëí ìíìê ìê ì ììëë:
+
+ ACQUIRE, {*F,*A}, *E, {*C,*D}, *B, RELEASE
+
+ [+] {*F,*A} ë ìíë ìììë ìëíëë.
+
+íìë ëìê êì ê ëêëíì:
+
+ {*F,*A}, *B, ACQUIRE, *C, *D, RELEASE, *E
+ *A, *B, *C, ACQUIRE, *D, RELEASE, *E, *F
+ *A, *B, ACQUIRE, *C, RELEASE, *D, *E, *F
+ *B, ACQUIRE, *C, *D, RELEASE, {*F,*A}, *E
+
+
+
+ìíëí ëíìí íì
+----------------------
+
+ìíëíë ëíìí íë íì (ACQUIRE ì ëì) ì ìíëíë íìí íë íì
+(RELEASE ì ëì) ë ìíìë ëëììëë ëìíëë. ëëì, ëëì ëëë
+ëëìë I/O ëëìê íìí ìíìëë ê ëëìëì ìíëí ëíìí íì
+ìì ëëìë ìêëììë íëë.
+
+
+ìëê ììíì íì
+--------------------
+
+êëë ëìíì íìë ìëíì ìí íëììë ìì ëíëë êê êìë êì
+íë ìëíë êëëë íìíì íìí ìíì ê ìëíë ìëê ìí ììëë
+êëë ëìí, ë ëìíêì ìíìììë ë ì ììëë. ìêì ìì ììëë
+ììëì ëëí íê ìí, íëììë ìì ëê íë êëê êìë êëì
+ëêì ëëìë ëííëë.
+
+ëì, ìì ììë ìì ìëììë ëìê êì ìëí ìíìë ëëëë:
+
+ for (;;) {
+ set_current_state(TASK_UNINTERRUPTIBLE);
+ if (event_indicated)
+ break;
+ schedule();
+ }
+
+set_current_state() ì ìí, íìí ìíê ëë í ëì ëëë ëëìê
+ìëìë ììëëë:
+
+ CPU 1
+ ===============================
+ set_current_state();
+ smp_store_mb();
+ STORE current->state
+ <ëì ëëì>
+ LOAD event_indicated
+
+set_current_state() ë ëìì êëë êìì ìë ììëë:
+
+ prepare_to_wait();
+ prepare_to_wait_exclusive();
+
+ìêë ìì ìíë ììí í ëì ëëë ëëìë ììíëë.
+ìì ìì ìíìë ëìê êì íìëë íëì ìí êëíë, ìêëì ëë
+ìëë ììì ëëë ëëìë ììíëë:
+
+ wait_event();
+ wait_event_interruptible();
+ wait_event_interruptible_exclusive();
+ wait_event_interruptible_timeout();
+ wait_event_killable();
+ wait_event_timeout();
+ wait_on_bit();
+ wait_on_bit_lock();
+
+
+ëëìë, êìêë ìííë ìëë ìëììë ëìê êì êëë:
+
+ event_indicated = 1;
+ wake_up(&event_wait_queue);
+
+ëë:
+
+ event_indicated = 1;
+ wake_up_process(event_daemon);
+
+wake_up() ëì ìí ìê ëëë ëëìê ëíëëë. ëì êêëì ëêë
+êìëëì. ì ëëìë íìí ìíê ìììê ìì ìíëëë, ìëíë
+ìëê ìí STORE ì íìí ìíë TASK_RUNNING ìë ììíë STORE ììì
+ììíê ëëë.
+
+ CPU 1 CPU 2
+ =============================== ===============================
+ set_current_state(); STORE event_indicated
+ smp_store_mb(); wake_up();
+ STORE current->state <ìê ëëì>
+ <ëì ëëì> STORE current->state
+ LOAD event_indicated
+
+íëë ëíëëë, ì ìê ëëë ëëìë ì ìëê ìëë ëêë êì ëìë
+ìíëëë. ìê ìëíê ìí, X ì Y ë ëë 0 ìë ìêí ëì ìëë êì
+íì ìëì ìëí ìíìë ìêí ëìë:
+
+ CPU 1 CPU 2
+ =============================== ===============================
+ X = 1; STORE event_indicated
+ smp_mb(); wake_up();
+ Y = 1; wait_event(wq, Y == 1);
+ wake_up(); load from Y sees 1, no memory barrier
+ load from X might see 0
+
+ì ììììì êìì ëë êìêê ìëë ííìëë, CPU 2 ì X ëëë 1 ì
+ëëê ëìë ì ìì êëë.
+
+ìì êëí êìêë íìëë ëìê êì êëì ììëë:
+
+ complete();
+ wake_up();
+ wake_up_all();
+ wake_up_bit();
+ wake_up_interruptible();
+ wake_up_interruptible_all();
+ wake_up_interruptible_nr();
+ wake_up_interruptible_poll();
+ wake_up_interruptible_sync();
+ wake_up_interruptible_sync_poll();
+ wake_up_locked();
+ wake_up_locked_poll();
+ wake_up_nr();
+ wake_up_poll();
+ wake_up_process();
+
+
+[!] ìììë ìëì êìë ìëì ëíëë ëëë ëëìëì êìê ìì
+ìëìì ìíìë ìììë ìëê set_current_state() ë íìí íì ííë
+ëëì ëí ììë ëìì _ìëëë_ ìì êìíìì. ìë ëì, ìììë
+ìëê ëìê êê:
+
+ set_current_state(TASK_INTERRUPTIBLE);
+ if (event_indicated)
+ break;
+ __set_current_state(TASK_RUNNING);
+ do_something(my_data);
+
+êìë ìëë ëìê êëë:
+
+ my_data = value;
+ event_indicated = 1;
+ wake_up(&event_wait_queue);
+
+event_indecated ìì ëêì ìììë ìëìê my_data ìì ëê íì ìëìì
+êìë ììë êìëë ëìì ììëë. ìë êììë ìì ìë ëë êêì
+ëìí ììì ììì ëëë ëëìë ìì ìì íëë. ëëì ìì ììë
+ìëë ëìê êì:
+
+ set_current_state(TASK_INTERRUPTIBLE);
+ if (event_indicated) {
+ smp_rmb();
+ do_something(my_data);
+ }
+
+êëê êìë ìëë ëìê êì ëìì íëë:
+
+ my_data = value;
+ smp_wmb();
+ event_indicated = 1;
+ wake_up(&event_wait_queue);
+
+
+êìì íìë
+-------------
+
+êìì ëëìë ëííë íìëì ëìê êìëë:
+
+ (*) schedule() ê ê ììí êëì ììí ëëë ëëìë ëííëë.
+
+
+==============================
+CPU ê ACQUIRING ëëìì íê
+==============================
+
+SMP ììíììì ë êëëì ëì êëí ííì ëëìë ìêíëë: ì
+ëëìë ëìí ëì ììíë ëë CPU ëì ëëë ììì ìììë ìíì
+ëìëë.
+
+
+ACQUIRE VS ëëë ììì
+------------------------
+
+ëìì ìë ìêí ëìë: ììíì ëêì ìíë (M) ê (Q), êëê ìêì CPU
+ë êìê ììëë; ìêì ëìì ìëí ìíìê ëìíëë:
+
+ CPU 1 CPU 2
+ =============================== ===============================
+ WRITE_ONCE(*A, a); WRITE_ONCE(*E, e);
+ ACQUIRE M ACQUIRE Q
+ WRITE_ONCE(*B, b); WRITE_ONCE(*F, f);
+ WRITE_ONCE(*C, c); WRITE_ONCE(*G, g);
+ RELEASE M RELEASE Q
+ WRITE_ONCE(*D, d); WRITE_ONCE(*H, h);
+
+*A ëì ìììëí *H ëì ìììêìê ìë ììë CPU 3 ìê ëìììì
+ëíìë ê CPU ììì ë ììì ìí ëíëì ìë ììì ììíêë ìë
+ëìë ììíì ììëë. ìë ëì, CPU 3 ìê ëìê êì ììë ëììë
+êì êëíëë:
+
+ *E, ACQUIRE M, ACQUIRE Q, *G, *C, *F, *A, *B, RELEASE Q, *D, *H, RELEASE M
+
+íìë ëìê êì ëììë ìì êëë:
+
+ *B, *C or *D preceding ACQUIRE M
+ *A, *B or *C following RELEASE M
+ *F, *G or *H preceding ACQUIRE Q
+ *E, *F or *G following RELEASE Q
+
+
+
+ACQUIRE VS I/O ììì
+----------------------
+
+íìí (íí NUMA ê êëë) íê íìì ëêì CPU ìì ëìí ìíëìë
+ëíëë ëêì íëíì ìì ìì I/O ìììë PCI ëëìì êìì I/O
+ìììë ëì ì ìëë, PCI ëëìë ìì ìêì íëíìê íì ëìì í
+ìëê ììëë, íìí ìê ëëë ëëìê ììëì ìê ëëìëë.
+
+ìë ëìì:
+
+ CPU 1 CPU 2
+ =============================== ===============================
+ spin_lock(Q)
+ writel(0, ADDR)
+ writel(1, DATA);
+ spin_unlock(Q);
+ spin_lock(Q);
+ writel(4, ADDR);
+ writel(5, DATA);
+ spin_unlock(Q);
+
+ë PCI ëëìì ëìê êì ëì ì ììëë:
+
+ STORE *ADDR = 0, STORE *ADDR = 4, STORE *DATA = 1, STORE *DATA = 5
+
+ìëê ëë íëììì ìëìì ììí ì ììëë.
+
+
+ìë êìì ììë ìíëì ëëëê ìì mmiowb() ë ìííì íëë, ìë
+ëë ëìê êìëë:
+
+ CPU 1 CPU 2
+ =============================== ===============================
+ spin_lock(Q)
+ writel(0, ADDR)
+ writel(1, DATA);
+ mmiowb();
+ spin_unlock(Q);
+ spin_lock(Q);
+ writel(4, ADDR);
+ writel(5, DATA);
+ mmiowb();
+ spin_unlock(Q);
+
+ì ìëë CPU 1 ìì ììë ëêì ìíìê PCI ëëìì CPU 2 ìì ììë
+ìíìëëë ëì ëììì ëìíëë.
+
+
+ëí, êì ëëìììì ìíìë ìì ëëê ìíëë ì ëëë ëëê ìíëê
+ìì ìíìê ìëëêë êìíëë mmiowb() ì íìê ìììëë:
+
+ CPU 1 CPU 2
+ =============================== ===============================
+ spin_lock(Q)
+ writel(0, ADDR)
+ a = readl(DATA);
+ spin_unlock(Q);
+ spin_lock(Q);
+ writel(4, ADDR);
+ b = readl(DATA);
+ spin_unlock(Q);
+
+
+ë ëì ìëë ìíì Documenataion/DocBook/deviceiobook.tmpl ì ìêíìì.
+
+
+=========================
+ëëë ëëìê íìí ê
+=========================
+
+ìë SMP ìëì ììíëëë ìê ìëëë ëìíë ìëë ìëëê ëìíë
+êìë ëìì êìê ëëì, íëí ììí ìììì ëëë ìíëìì ìëìë
+ìëììë ëìê ëì ììëë. íìë, ìëìê ëìê _ë ì ìë_ ëêì
+íêì ììëë:
+
+ (*) íëììê ìí ìì.
+
+ (*) ìíë ìíëìì.
+
+ (*) ëëìì ììì.
+
+ (*) ìíëí.
+
+
+íëììê ìí ìì
+--------------------
+
+ëê ììì íëììë êì ììíì ìëë, ììíì ëê ììì CPU ë ëìì
+êì ëìíì ëí ììì í ì ììëë. ìë ëêí ëìë ììí ì ìê,
+ì ëìë íêíë ìëì ëëì ëì ììíë êìëë. íìë, ëì ìëí
+ëìì ëìì êëíë ëì ììíì ìê ìì ìëíë êì ëìëë. ìë
+êì, ë CPU ëëì ìíì ëìë ìíëììëì ìëìì ëê ìí ììíê
+ììê ëììì íëë.
+
+ìë ëì, R/W ìëíìì ëë ìíêë (slow path) ë ìêí ëìë.
+ìëíìë ìí ëêë íë íëì íëììê ììì ìí ì ìëë ì
+ìëíìì ëê íëìì ëìíì ëíí ìë ììëë:
+
+ struct rw_semaphore {
+ ...
+ spinlock_t lock;
+ struct list_head waiters;
+ };
+
+ struct rwsem_waiter {
+ struct list_head list;
+ struct task_struct *task;
+ };
+
+íì ëê ìí íëììë êìê ìí, up_read() ë up_write() íìë ëìê
+êì ìì íëë:
+
+ (1) ëì ëê ìí íëìì ëìëë ìëìëì ìê ìí ì ëê ìí
+ íëìì ëìëì next íìíë ììëë;
+
+ (2) ì ëê ìí íëììì task êììëì íìíë ììëë;
+
+ (3) ì ëê ìí íëììê ìëíìë íëíìì ìëê ìí task
+ íìíë ìêí íëë;
+
+ (4) íë íìíì ëí wake_up_process() ë íìíëë; êëê
+
+ (5) íë ëê ìí íëììì task êììë ìê ìë ëíëìë íìíëë.
+
+ëë ëíìë, ëì ìëí ìíìë ìííì íëë:
+
+ LOAD waiter->list.next;
+ LOAD waiter->task;
+ STORE waiter->task;
+ CALL wakeup
+ RELEASE task
+
+êëê ì ìëíëì ëë ììë ìíëëë, ìëìì ììë ì ììëë.
+
+íë ìëíìì ëêìì ëìêê ìëíì ëì ëìëë, íë ëê íëììë
+ëì ëìë ìì ììëë; ëì ììì task íìíê ìêí ëê êëëëë.
+ê ëìëë ëê íëììì ìíì ìê ëëì, ëìíì next íìíê ìíìê
+_ìì_ task íìíê ìììëë, ëë CPU ë íë ëê íëììë ììí ëëê
+up*() íìê next íìíë ìê ìì ëê íëììì ìíì ëê êëë ì
+ììëë.
+
+êëê ëë ìì ìëí ìíìì ìë ìì ììëëì ìêí ëì:
+
+ CPU 1 CPU 2
+ =============================== ===============================
+ down_xxx()
+ Queue waiter
+ Sleep
+ up_yyy()
+ LOAD waiter->task;
+ STORE waiter->task;
+ Woken up by other event
+ <preempt>
+ Resume processing
+ down_xxx() returns
+ call foo()
+ foo() clobbers *waiter
+ </preempt>
+ LOAD waiter->list.next;
+ --- OOPS ---
+
+ì ëìë ìëíì ëì ìììë íêë ìë ìêìë, êëê ëë êìë íì
+down_xxx() íìê ëíìíê ìíëì ëëì ìììë íëë.
+
+ì ëìë íêíë ëëì ëì SMP ëëë ëëìë ìêíë êëë:
+
+ LOAD waiter->list.next;
+ LOAD waiter->task;
+ smp_mb();
+ STORE waiter->task;
+ CALL wakeup
+ RELEASE task
+
+ì êìì, ëëìë ììíì ëëì CPU ëìê ëë ëëì ìì ëëë ìììê
+ëëì ëì ëëë ìììëë ìì ììë êìë ëìê ëëëë. ëëì ìì
+ëëë ìììëì ëëì ëë ììê ìëëë ììêì ìëëëêë ëìíì
+_ììëë_.
+
+(ìê ëìê ëì ìì) ëì íëìì ììíìì smp_mb() ë ììëë êì
+ìíìëê CPU ìììì ììë ëêêë íì ìê ììì ììëë ëëì
+ëëëë íë ìíìë ëëìì ëìëë. ìì íëì CPU ë ììë, CPU ì
+ììì ìì ëìì ê ìì ëëêì ììì ìëí êëë.
+
+
+ìíë ìíëìì
+-----------------
+
+ìíë ìíëììì êìììë íëììê ìíìììë ëëëë ê ì ìëë
+ìì ëëë ëëìë ëííê ë ìëë ëííì ììë, ìëìì ìëí
+ììììë ììíë êë ì íëìëë.
+
+ëëëì ìë ìíë ììíê íë ìíì ëí (ììì ëë ììì) ìëë
+ëííë ìíë ìíëììì ëë SMP-ìêì ëì ëëë ëëì(smp_mb())ë
+ìì ìíëììì ìê ëì ëííëë. ìë ìíëììì ëìì êëì
+íííëë:
+
+ xchg();
+ atomic_xchg(); atomic_long_xchg();
+ atomic_inc_return(); atomic_long_inc_return();
+ atomic_dec_return(); atomic_long_dec_return();
+ atomic_add_return(); atomic_long_add_return();
+ atomic_sub_return(); atomic_long_sub_return();
+ atomic_inc_and_test(); atomic_long_inc_and_test();
+ atomic_dec_and_test(); atomic_long_dec_and_test();
+ atomic_sub_and_test(); atomic_long_sub_and_test();
+ atomic_add_negative(); atomic_long_add_negative();
+ test_and_set_bit();
+ test_and_clear_bit();
+ test_and_change_bit();
+
+ /* exchange ìêì ìêí ë */
+ cmpxchg();
+ atomic_cmpxchg(); atomic_long_cmpxchg();
+ atomic_add_unless(); atomic_long_add_unless();
+
+ìêëì ëëë ëëì íêê íìí ACQUIRE ëëì RELEASE ëë ìíëììëì
+êíí ë, êëê êì íìë ìí ëíëì ììíë ììí ë, ìëì ëëë
+ëëì íêê íìí ê ëì ììëëë.
+
+
+ëìì ìíëììëì ëëë ëëìë ëííì _ìê_ ëëì ëìê ë ì
+ììë, RELEASE ëëì ìíëììëê êì êëì êíí ë ììë ìë
+ììëë:
+
+ atomic_set();
+ set_bit();
+ clear_bit();
+ change_bit();
+
+ìêëì ììí ëìë íìíëë ììí (ìë ëë smp_mb__before_atomic()
+êì) ëëë ëëìê ëìììë íê ììëìì íëë.
+
+
+ìëì êëë ëëë ëëìë ëííì _ìê_ ëëì, ìë íêììë (ìë
+ëë smp_mb__before_atomic() ê êì) ëììì ëëë ëëì ììì íìíëë.
+
+ atomic_add();
+ atomic_sub();
+ atomic_inc();
+ atomic_dec();
+
+ìêëì íê ììì ìí ììëëë, êëê íê ëìí ììì êêê ììíì
+ìëëë ëëë ëëìë íìì ìì êëë.
+
+êìì ìëì êëíê ìí ëíëì ììí ëììë ììëëë, ëíëì
+ììíë ëìë ëíëë ììììë ììëêë íìíë ìì ìë ìëí
+ëíëìë ìê ìì êìê ëëì ëëë ëëìë ìë íì ìì êëë.
+
+ëì ìë ëì êìíê ìí ììëëë, ë êë ëìì ìëììë ììì íì
+ììëë ìííì íëë ëëë ëëìê íìí ì ììëë.
+
+êëììë, ê ìììììë ëëë ëëìê íìíì ìëì ìëí êëíì
+íëë.
+
+ìëì ìíëììëì íëí ë êë ëìëìëë:
+
+ test_and_set_bit_lock();
+ clear_bit_unlock();
+ __clear_bit_unlock();
+
+ìêëì ACQUIRE ëì RELEASE ëì ìíëììëì êííëë. ë êë ëêë
+êíí ëìë ìêëì ì ë ìííë íì ëìë, ìêëì êíì ëì
+ìííììì ììí ë ì ìê ëëìëë.
+
+[!] ìë ìíì ììí ì ìë íìí ëëë ëëì ëêëì ììëëë, ìë
+CPU ììë ììëë ìíë ììíëì ììì ëëë ëëìê ëíëì ììì
+ìíë ìíëììê ëëë ëëìë íê ììíë ê ëíìí ìì ë ì
+ìëë, êë êìì ì íì ëëë ëëì ëêëì no-op ì ëì ììììë
+ìëìë íì ììëë.
+
+ë ëì ëìì ìíì Documentation/atomic_ops.txt ë ìêíìì.
+
+
+ëëìì ììì
+---------------
+
+ëì ëëììê ëëë ëí êëìë ììë ì ìëë, êëê ììëë
+ëëììë CPU ìë ëì íì ëëë ììì ìíìë ëìê ëëë. ëëìëë
+êë ëëììë ììíê ìí ìíí ìëë ììë ìëë ëëë ìììë
+ëëìì íëë.
+
+íìë, ìììëì ìëì íêë ìííêë ëííëê ë íìììë íëíë
+ìëí CPU ë ìíìëëì ììíë ëëìë ìëì ìììëê ìì ëìì
+ìììëì ëëìììë ììë ììëë ëìíì ëíê í ì ìë - ëëììê
+ìëìì íê í - ììì ëìê ìê ì ììëë.
+
+ëëì ìë ëëìì, I/O ë ìëê ìììëì ììí ììììê ëë ì ìëì
+ìê ìë, - inb() ë writel() ê êì - ììí ììì ëíì íí ìëìììë
+íëë. ìêëì ëëëì êììë ëìì ëëë ëëì ì íê ììë íìê
+ììëëë, ëìì ëêì ìíììë ëìì ëëë ëëìê íìí ì ììëë:
+
+ (1) ìë ììíìì I/O ìíìë ëë CPU ì ìêëê ìì ëììì ìëë,
+ ëëì _ëë_ ìëìì ëëìëëì ëì ììëììë íê ì íëíì
+ ììì ëìëìê ìì mmiowb() ê ê íìëìì íëë.
+
+ (2) ëì ììì íìëì ìíë ëëë ììì ììì êë I/O ëëë ìëìë
+ ììíëë, ììë êìíê ìíì _mandatory_ ëëë ëëìê íìíëë.
+
+ë ëì ìëë ìíì Documentation/DocBook/deviceiobook.tmpl ì ìêíììì.
+
+
+ìíëí
+--------
+
+ëëìëë ììì ìíëí ìëì ëíì ìí ìíëí ëí ì ìê ëëì
+ëëìëì ì ë ëëì ìëì ëëìì ìì ëë ììì ëëê ìí êìí ì
+ììëë.
+
+ììëìê ìíëí ëíë ê ëêëíê íê, ëëìëì íëíìí
+ìíëììëì ëë ìíëíê ëêëíê ë ììì ììëêë íë ëë (ëì
+í íí) ìë ìë ìí êìì - ììí ëëììëëë - ìì ì ììëë.
+ëëìëì ìíëí ëíì ìí ìì ëì, íë ëëìëì ììë êì CPU ìì
+ìíëì ìì êìë, íìì ìíëíê ìëëë ììë ëëì ìíëíê
+ììëì ëíëë ëì ììë ìíëí íëëë êì ëíìë ëì ìì ììë
+ëëë.
+
+íìë, ìëëì ëììíì ëìí ëììíë êë ìëë ìëë ëëë
+ëëìëë ìêí ëìë. ëì ì ëëìëì ììê ìíëíë ëíìíìí
+ìë ìëë ìëì ëííê ëëìëì ìíëí íëëê íìëìëë:
+
+ LOCAL IRQ DISABLE
+ writew(ADDR, 3);
+ writew(DATA, y);
+ LOCAL IRQ ENABLE
+ <interrupt>
+ writew(ADDR, 4);
+ q = readw(DATA);
+ </interrupt>
+
+ëì ìì êìì ìëí ìíëì ìëë ëìí ëììíìì ìíìë ìëëì
+ëììíì ëëìë ííìë ìíì ëì ììë ìë ììëë:
+
+ STORE *ADDR = 3, STORE *ADDR = 4, STORE *DATA = y, q = LOAD *DATA
+
+
+ëì ìì êìì ìëí ìíëì ìê ëìììëë ëìììëë ëëìê
+ììëì ììëë ìíëí ëíìí ìììì ììë ìììê ëêìë ììì
+ìíëí ëìì ììë ìììì ìì ì ìëê - êëê ê ëëë - êìíìë
+íëë.
+
+êë ìì ììì ììëë I/O ìììëì ìêí ìì êìì I/O ëììíì
+ëìì I/O ëëìë íìíë ëêì (synchronous) ëë ìíëììì íííê
+ëëì ìëììëë ìëê ëìê ëì ììëë. ëì ìêëë ìëì ìëë
+mmiowb() ê ëìììë ììë íìê ììëë.
+
+
+íëì ìíëí ëíê ëëì CPU ìì ìíììë ìë íìì íë ë ëí
+ìììë ëìí ìíì ììë ì ììëë. ëì êë êìê ëìí êëìì
+ìëë, ììë ëìíê ìí ìíëí ëíìí ëì ììëìììë íëë.
+
+
+======================
+ìë I/O ëëìì íê
+======================
+
+I/O ëëëì ìììí ë, ëëìëë ììí ììì íìë ììíì íëë:
+
+ (*) inX(), outX():
+
+ ìêëì ëëë êêëëë I/O êêì ììêë íëë ìëë
+ ëëìììëëë, êê êëììë CPU ëë ëë ìììëë. i386 ê
+ x86_64 íëììëì íëí I/O êê ììì ììíê ëëìë ììë êìê
+ ììë, ëë ëì CPU ëìë êë ììì ììíì ììëë.
+
+ ëë êë ìììë PCI ëìê I/O êê ììì ììíëë, ìë - i386 ê
+ x86_64 êì CPU ìì - CPU ì I/O êê ìììë ìê ëìëëë. íìë,
+ ëìí I/O êêì ìë CPU ììë CPU ì ëëë ëì êì I/O êêìë
+ ëíë ìë ììëë.
+
+ ì êêìëì ìììë (i386 ëììë) ììíê ëêí ëëëë, ìêì
+ (PCI íìí ëëìì êì) ëëìëì ìë ììí ëìíì ìììë
+ ììëë.
+
+ ìêëì ìíêì ììë ììíê ëìëëë.
+
+ ëë íìì ëëë ìíëìì, I/O ìíëììì ëí ììë ììíê
+ ëìëìë ììëë.
+
+ (*) readX(), writeX():
+
+ ìêëì ìí ììëë CPU ìì ìëìê ììí ììê ëììê ëëììë
+ ìíëëìì ëí ëì ìëë ìëì ììì íë ëëë ìëìì ììë
+ íìì ìí êìëëë. ìë ëì, ììì i386 ìííì ëìììë MTRR
+ ëììíë ì íìì ììëëë.
+
+ ìëììëë, íëíì (prefetch) êëí ëëììë ììì íëê
+ ìëëë, ìêëì ììí ììê ëììê êíëì ìê ëìë êëë.
+
+ íìë, (PCI ëëìì êì) ìêì íëììë ììì ìíëë ìíì
+ ìêìí ì ììëë; ìíì ëëì ììë íëììë ëëëëê(flush)
+ ìíìë êì ììëëí ëëë íë ëëì ììëëë[*], PCI ì êìë
+ êì ëëììë íê êì ììììì ëëëìëë ìëí êëë.
+
+ [*] ìì! ììì êê êì ììëëíì ëëë ìëíë êì ìëìì
+ ììí ìë ììëë - ìë 16650 Rx/Tx ìëì ëììíë ìêí
+ ëìì.
+
+ íëíì êëí I/O ëëëê ììëë, ìíì ëëëì ììë ìíëë
+ íê ìí mmiowb() ëëìê íìí ì ììëë.
+
+ PCI íëìì ììì ìíììì ëí ë ëì ìëë ìíì PCI ëììë
+ ìêíìê ëëëë.
+
+ (*) readX_relaxed(), writeX_relaxed()
+
+ ìêëì readX() ì writeX() ë ëìíìë, ë ìíë ëëë ìì ëìì
+ ìêíëë. êìììë, ìêëì ìëì ëëë ììì (ì: DMA ëí) ìë
+ LOCK ìë UNLOCK ìíëììëìë ììë ëìíì ììëë. LOCK ìë
+ UNLOCK ìíëììëì ëììë ììê íìíëë, mmiowb() ëëìê ììë
+ ì ììëë. êì ìë ìììì ìíë ìììëëë ììê ìììì ìì
+ ëìê ëëëë.
+
+ (*) ioreadX(), iowriteX()
+
+ ìêëì inX()/outX() ë readX()/writeX() ìë ììë ìííë ìììì
+ ìëì ëë ììíê ìíë êìëë.
+
+
+===================================
+êìëë êì ìíë ìí ìì ëë
+===================================
+
+ììììë CPU ë ììì íëêëì ëí íëêë ê ìììë ìêì (program
+causality) ì ìíë êìë ëìê íìë ìëììëë ììë êì ìììì
+ìëëê êìëììë íëë. (i386 ìë x86_64 êì) ìë CPU ëì ìë
+ìëìì (powerpc ë frv ì êì) ëë êëì ëí êí ììì êìë, ìííì
+ììì ìë ììì ìëììë ììì ëí ììì êì ìíë êì (DEC Alpha)
+ë êìíì íëë.
+
+ì ëì, CPU ìê ìììë ììíëì ìíë ëì í ììíëìì ìì
+ììíëìì ììììëë ìì ììíëìì ëì ììì ììíëìì ìíëê
+ìì ìë[*]ë ì ììì íëë ìì (ëë ëíì, ìêìì ìììë êìë
+ëìê í) ììë ììì ìíë ììëë - ììì ëëììëë - ê ìíëì
+ìíí ì ììì ìëíëë
+
+ [*] ìë ììíëìì íë ììì ìí - ìê ìëë ëêëëì, ëììíë
+ ëëëë ëêëëì - ì ëëìëë, ëë ììíëìì ëë íêì
+ ìììì ì ììëë.
+
+CPU ë ììììë ìë íêë ëëì ìë ììíëì ìíìë ììëë ìë
+ììëë. ìë ëì, ëì ëêì ììëë ììíëìì ë ë êì ëììíì
+ìììì ê (immediate value) ì ììëëëë, ìëì ììíëìì ëëì ìë
+ììëë.
+
+
+ëìíê, ìíìë ìì íëêëì ìêìë ìììëë ììíëì ìíëì
+ììì ëêì ìëëë ìêëëëë ìëì í ì ììëë.
+
+
+===============
+CPU ììì ìí
+===============
+
+ììë ëëë ìíëììëì ììí ììì ìëê ììëëìë CPU ì ëëë
+ììì ììíë ììë, êëê ììí ìíì ìêìì êëíë ëëë ìêì
+ììíì ìë ëë ìíì ëìëë.
+
+í CPU ê ììíì ëë ëëëê ììë íí ìíììíëë, ëëë ììíì
+CPU ì ììëì íííì íë, CPU ì CPU ììì ìì ììììì ëìì ìí
+ëëë ëëìë êìì íëë. (ëëë ëëìë ëëììëë ëì êëì
+ìììì ëìíëë):
+
+ <--- CPU ---> : <----------- Memory ----------->
+ :
+ +--------+ +--------+ : +--------+ +-----------+
+ | | | | : | | | | +--------+
+ | CPU | | Memory | : | CPU | | | | |
+ | Core |--->| Access |----->| Cache |<-->| | | |
+ | | | Queue | : | | | |--->| Memory |
+ | | | | : | | | | | |
+ +--------+ +--------+ : +--------+ | | | |
+ : | Cache | +--------+
+ : | Coherency |
+ : | Mechanism | +--------+
+ +--------+ +--------+ : +--------+ | | | |
+ | | | | : | | | | | |
+ | CPU | | Memory | : | CPU | | |--->| Device |
+ | Core |--->| Access |----->| Cache |<-->| | | |
+ | | | Queue | : | | | | | |
+ | | | | : | | | | +--------+
+ +--------+ +--------+ : +--------+ +-----------+
+ :
+ :
+
+íì ëëë ìíìë íë ìíëììì ììí CPU ì ìì ëìì ëìì ìëí
+ìë ìê ëëì íë CPU ì ëêìë ëìì ìì ì ììë, ëë CPU ê êìì
+êëëë ìì ìêì ëìëìì íë ììëìì íë CPU ìê ìëíê, íë
+ëëë ììì ëí ìíëììì ëìí ëëë ê ìíì ìíìíê ëëì, íë
+ìíëììì ëëëì ììë ìììë íêìë ëíë êìëë.
+
+CPU ììë íëêëì ìêìì ììëëêë ìêìëë ììíëìëì ìë
+ììëë ìëìíì ìíí ì ììëë. ìë ììíëìëì ëëë ìíì
+ìíëììì ëëëë ì ìíëììëì ìí ìíë ëëë ììì íì ëìêê
+ëëë. ììë ì ìíëììëì íë íì ìë ììëë ìíëëë ëì ì
+ìê, ëë ììíëìì ìëë êëëëë êìëê ìêìë ìíì êìíëë.
+
+ëëë ëëìê íë ìì CPU ììì ëëë ììë ëìêë ìììëì ìì,
+êëê ê ìììì êêê ììíì ëë êììëìê ììëë ììë ììíë
+êìëë.
+
+[!] CPU ëì íì êë ììì ëëì ìíìë íëêë ììëë ììë êìë
+ëê ëëì, ììì CPU ëììë ëëë ëëìë ììí íìê _ììëë_.
+
+[!] MMIO ë ëë ëëìì ìììëì ìì ììíì ìíí ìë ììëë. ìí
+ìëë ëëììê ììì ëë ëëë ìëìì íìì ìí êìë ìë ìê, CPU
+ê êìê ìì ì ìë íìí ëëìì íì ììíëìì ììì ìíì êìë
+ìë ììëë.
+
+
+ìì ìêì
+-----------
+
+íìë ìì ììì ììêí êìë ëìíì ììëë: ììëì ìêìì êìë
+êëëìë, ê ìêìì ìììë ììë êëë ëìì ììëë. í CPU ìì
+ëëìì ëê ìíì ììììëë ììíì ëë CPU ìê ëììê ëìë, ëë
+CPU ëìêë êì ììë ëìê ë êëë ëìì ìëë ëìëë.
+
+
+ëêì CPU (1 & 2) ê ëë ìê, ê CPU ì ëêì ëìí ìì(CPU 1 ì A/B ë,
+CPU 2 ë C/D ë êìëë)ê ëëë ìêëì ìë ììíì ëëëê ìêí
+ëìë:
+
+ :
+ : +--------+
+ : +---------+ | |
+ +--------+ : +--->| Cache A |<------->| |
+ | | : | +---------+ | |
+ | CPU 1 |<---+ | |
+ | | : | +---------+ | |
+ +--------+ : +--->| Cache B |<------->| |
+ : +---------+ | |
+ : | Memory |
+ : +---------+ | System |
+ +--------+ : +--->| Cache C |<------->| |
+ | | : | +---------+ | |
+ | CPU 2 |<---+ | |
+ | | : | +---------+ | |
+ +--------+ : +--->| Cache D |<------->| |
+ : +---------+ | |
+ : +--------+
+ :
+
+ì ììíì ëìê êì íìì êëë ìêí ëìë:
+
+ (*) íìë ììëìì ìì A, ìì C ëë ëëëì ììí ì ìì;
+
+ (*) ììë ììëìì ìì B, ìì D ëë ëëëì ììí ì ìì;
+
+ (*) CPU ììê íêì ììì ìêíë ëì, ëë ììë - ëí ììëìì
+ ëëëì ëëêë ììì ëëë íêë íê ìí - ììíì ëë ëëì
+ ììì íê ìí ëìë ììí ì ìì;
+
+ (*) ê ììë ììíì ëëì ëëëê ìêìì ëìê ìí íë ììì
+ ììëìì í ìíëììëì íë êì;
+
+ (*) ì ìêì íë ììì ìë ììíë ëìì êíìë íëí ëëì ìíìë
+ ëììì ìëë, íì ìíëììëì ì ëëì êêì ìíì ëì ì ìë
+ íìëë êëí.
+
+ìì, ìëì CPU ìì ëêì ìê ìíëììì ëëëë, íë CPU ì ììì
+ììë ììë ìíëììì ëëëì ëìíê ìí ë ìíëìì ììì ìê
+ëëìë ììíë ìíì ììí ëìë:
+
+ CPU 1 CPU 2 COMMENT
+ =============== =============== =======================================
+ u == 0, v == 1 and p == &u, q == &u
+ v = 2;
+ smp_wmb(); v ì ëêì p ì ëê ìì ëì êì
+ ëëí í
+ <A:modify v=2> v ë ìì ìì A ì ëìììë ììí
+ p = &v;
+ <B:modify p=&v> p ë ìì ìì B ì ëìììë ììí
+
+ìêìì ìê ëëë ëëìë CPU 1 ì ììê ìëë ììë ìëìí ë êìë
+ììíì ëë CPU ëì ììíê ëëëë. íìë, ìì ëëì CPU ê ê êëì
+ììë íë ìíì ìêí ëìë:
+
+ CPU 1 CPU 2 COMMENT
+ =============== =============== =======================================
+ ...
+ q = p;
+ x = *q;
+
+ìì ëêì ìê ìíëììì ììë ììë ììëì ëí ì ìëë, ëëì CPU
+ì í ììì ëë ìì ìëíê ëìí v ë ëê ìë ììëìì íë ìììì
+ìëìíê ììëë ìì, p ë ëê ìë ììëìì ëëì CPU ì ëë ììì
+ìëìí ëìëëì ì ìê ëëìëë.
+
+ CPU 1 CPU 2 COMMENT
+ =============== =============== =======================================
+ u == 0, v == 1 and p == &u, q == &u
+ v = 2;
+ smp_wmb();
+ <A:modify v=2> <C:busy>
+ <C:queue v=2>
+ p = &v; q = p;
+ <D:request p>
+ <B:modify p=&v> <D:commit p=&v>
+ <D:read p>
+ x = *q;
+ <C:read *q> ììì ìëìí ëê ìì v ë ìì
+ <C:unbusy>
+ <C:commit v=2>
+
+êëììë, ëêì ììëì ëë CPU 2 ì ììììëë ìëìí ë êììë,
+ëëì êì ììë, ìëìíì ììê CPU 1 ìì ëëìì ììì ëìí
+êìëë ëìì ììëë.
+
+
+ìêì êìíê ìíì, ëìí ììì ëëìë ìê ëëìë ëë ìíëììë
+ììì ëìì íëë. ìëê íìëì ììê ëì ììì ìëíê ìì ìêì
+íë ìëíëë êìíê ëëë.
+
+ CPU 1 CPU 2 COMMENT
+ =============== =============== =======================================
+ u == 0, v == 1 and p == &u, q == &u
+ v = 2;
+ smp_wmb();
+ <A:modify v=2> <C:busy>
+ <C:queue v=2>
+ p = &v; q = p;
+ <D:request p>
+ <B:modify p=&v> <D:commit p=&v>
+ <D:read p>
+ smp_read_barrier_depends()
+ <C:unbusy>
+ <C:commit v=2>
+ x = *q;
+ <C:read *q> ììì ìëìí ë v ë ìì
+
+
+ìë ëëì ëìë DEC Alpha êì íëììëìì ëêë ì ìëë, ìëì
+ëìí ëìë ì ë ì ììí ìëì êìí ì ìë, ëíë ììë êìê ìê
+ëëìëë. ëëëì CPU ë íëì ìê ìíëììì ëëë ìììê ëë ìê
+ìíëììì ììììëë ëìí ììì ëëìë ëíìíëëë, ëëê êëê
+ìëê ëëì ììì ììíì ìëëë.
+
+ëë CPU ëë ëíë ììë êìê ìì ì ììë, êë CPU ëì íëí ëëë
+ìììë ìíìë ì ëíë ììë ììì ììì íìë íëë. Alpha ë êì
+ìí ëëë ìì ìëí (semantic) ì ìííìëì ëëë ëëìê ëìììë
+ììëì ììì ëìë êë ììì íìíì ìê íìëë.
+
+
+ìì ìêì VS DMA
+------------------
+
+ëë ììíì DMA ë íë ëëììì ëíìêì ìì ìêìì ììíìë
+ììëë. êë êì, DMA ë ìëíë ëëììë RAM ìëëí ìëë ëìíë
+ìì ì ìëë, ëí ìì ëìì CPU ì ììì ëëëê ìê, ëë êì ìì
+RAM ì ììì ììì ì ìê ëëìëë. ì ëìë íêíê ìíì, ìëì
+ììí ëëìì ê CPU ììì ëìëë ëíëì íëì (flush) ìììë íëë
+(êëê êêëì ëíí - invalidation - ìí ìë ìêì).
+
+ëí, ëëììì ìí RAM ì DMA ë ììì êì ëëììê ìêë ìëí íì
+CPU ì ìììì RAM ìë ìììë ëí ìì ëìì ìí ëììì ìë ìê, CPU
+ì ììì ììíë ìì ëìì íë ìììì ììëê ëì êì ììëìê
+ìêìë RAM ì ìëìí ëìëë ìì ììê ìêì ëë ìë ììëë. ì
+ëìë íêíê ìíì, ìëì ììí ëëìì ê CPU ì ìì ìì ëìê ëë
+ëíëì ëíí ììì íëë.
+
+ìì êëì ëí ë ëì ìëë ìíì Documentation/cachetlb.txt ë
+ìêíìì.
+
+
+ìì ìêì VS MMIO
+-------------------
+
+Memory mapped I/O ë ìëììë CPU ì ëëë êê ëì í ìëìì íì ëë
+ëì ëëë ììì ìëììëë, ì ìëìë ìëìì, RAM ìë ííë
+ìëììë ëë íìì êìëë.
+
+êë íì êìë íëë, ìëììë êë ìììë ììë ììí ìííê
+ëëìì ëìë êëë ííëë êìëë. ì ëì MMIO ìììë ëì
+ììëìì ìììì ìëë ëëë ìììë ììí ì ìëë ëìëë. ìë
+êìì ëëë ëëìëìëë ìëì ìê, ëì ììë ëëë ìê ìíëììê
+MMIO ìììê ìë ëììëë ììììëë íë ììë ë ìíëìì ììì
+ëìì(flush)ìë íëë.
+
+
+======================
+CPU ëì ììëë ìë
+======================
+
+íëêëëë CPU ê ëëë ìíëììëì ìíí ììíëë ìíí ì êìëê
+ìêíëë, ìë ëì ëìê êì ìëë CPU ìê ëêëë:
+
+ a = READ_ONCE(*A);
+ WRITE_ONCE(*B, b);
+ c = READ_ONCE(*C);
+ d = READ_ONCE(*D);
+ WRITE_ONCE(*E, e);
+
+CPU ë ëì ììíëìì ìëíê ìì íìì ììíëìì ìí ëëë
+ìíëììì ìëí êìë ìêíê, ëëì ììí ìëìì êìíêìë ìíì
+ììëë ìíëììì ìíë êìë ììíëë:
+
+ LOAD *A, STORE *B, LOAD *C, LOAD *D, STORE *E.
+
+
+ëìíìë, ììëë íì ìëìëë. ëì CPU ì ìíìëìì ìì êìì
+ìëíì ëíëë ê ììë ëìê êìëë:
+
+ (*) ëë ìíëììëì ìíì êì íëêê ìí êëë ìëë íìê ìë
+ êìê ëì ëë, ìíì ìíëììëì ìì ëëë ëì ìì ììë ì
+ ììëë;
+
+ (*) ëë ìíëììëì ììììë ìíë ì ììë, íììë ëëìëê
+ ìëë ììì ëëì êêë ëëìëë;
+
+ (*) ëë ìíëììëì ììììë ìíë ì ììëë, ììë ìëíì
+ ìíìì ëë ìêì ëëê ìëì ì ììëë;
+
+ (*) ëëë ììì ììë CPU ëìì ììë ì ë ì ììí ì ìëë ìëì
+ ë ì ììëë;
+
+ (*) ëëì ìíìë ììí ìììì ìììëì ìêììë ìëí ì ìë
+ ëëëë I/O íëìì (ëëëì PCI ëëìì ë ë ìê êëí ì
+ ììëë) ì ëí ììëë êì, êë ìíëììì ìí íëìì ìì
+ ëìì ìëê ìí ìíëì ìíë ì ììëë; êëê
+
+ (*) íë CPU ì ëìí ììê ììì ìíì ëì ìë ìê, ìì ìêì
+ ëìëìì - ìíìê ììë ììì ëëíëë - ì ëìë ìíìí ìë
+ ììë ì ìêì êëê ëë CPU ëìë êì ììë ìëëëë ëìì
+ ììëë.
+
+ëëì, ìì ìëì ëí ëë CPU ê ëë êêë ëìê êì ì ììëë:
+
+ LOAD *A, ..., LOAD {*C,*D}, STORE *E, STORE *B
+
+ ("LOAD {*C,*D}" ë ìíë ëëìëë)
+
+
+íìë, CPU ë ììëë ìêìì êì ëìíëë: CPU _ìì_ ì ìììëì
+ìììêë ëëë ëëìê ìììë ëêíê ìíí ìì ììì êìë ëìì
+êìëë. ìë ëì ëìì ìëê ìììëë:
+
+ U = READ_ONCE(*A);
+ WRITE_ONCE(*A, V);
+ WRITE_ONCE(*A, W);
+ X = READ_ONCE(*A);
+ WRITE_ONCE(*A, Y);
+ Z = READ_ONCE(*A);
+
+êëê ìëì ìíì ìí êìì ìëê êìíë, ìì êêë ëìê êì
+ëíë êìëê ììë ì ììëë:
+
+ U == *A ì ìì ê
+ X == W
+ Z == Y
+ *A == Y
+
+ìì ìëë CPU ê ëìì ëëë ììì ìíìë ëëëë íêëë:
+
+ U=LOAD *A, STORE *A=V, STORE *A=W, X=LOAD *A, STORE *A=Y, Z=LOAD *A
+
+íìë, ëëë êìì ìê íëêëì ììì ì ììì ììí ìêììëê
+ëìëë ëìë ìììëë ì ìíìë ìë ìíìëë ìêìë ì ììë, ê
+ìììëì íììêë ëëì ì ììëë. ìë ìííììì CPU ë êì ììì
+ëí ìììì ëë ìíëììëì ìëì í ì ìê ëëì ìì ìììì
+READ_ONCE() ì WRITE_ONCE() ë ëëì ììíì íì ììëìì. êë ìëì
+ìííììì READ_ONCE() ì WRITE_ONCE() ë ì ëìë ëê ìí íìí ìì
+ëê ëëì íê ëëë, ìë ëì Itanium ììë READ_ONCE() ì WRITE_ONCE()
+ê ììíë volatile ììíì GCC ê êë ìëìë ëìíë íì ììíëìì
+ld.acq ì stl.rel ììíëìì êê ëëì ëëë íëë.
+
+ìíìë ìì ì ìíìì ìììëì CPU ê ëêë ìì íìêë ëëêë ëë
+ëëëë ì ììëë.
+
+ìë ëì:
+
+ *A = V;
+ *A = W;
+
+ë ëìê êì ëíë ì ììëë:
+
+ *A = W;
+
+ëëì, ìê ëëìë WRITE_ONCE() ê ìëë *A ëì V êì ììì íêë
+ìëìëê êìë ì ììëë. ëìíê:
+
+ *A = Y;
+ Z = *A;
+
+ë, ëëë ëëìë READ_ONCE() ì WRITE_ONCE() ììë ëìê êì ëíë ì
+ììëë:
+
+ *A = Y;
+ Z = Y;
+
+êëê ì LOAD ìíëììì CPU ëêìë ìì ëìì ììëë.
+
+
+êëê, ALPHA ê ìë
+---------------------
+
+DEC Alpha CPU ë êì ìíë ëëë ììì CPU ì íëìëë. ëë ìëë,
+Alpha CPU ì ìë ëìì ëíë ëìí ììë êìê ììì, ìëììë
+êêëì ìë ëêì ìì ëìì ìë ëë ìêì ìëìí ëëê êëíëë.
+ìê ëìí ììì ëëìê ìë íìíìë ëëìë, ëìí ììì ëëìë
+ëëë ìêì ììíê íê ëêì ììë ëêí ììì, íìí ëêê ìëì
+ëìíì ëêì ìëë ììë ììëê íê ëëìëë.
+
+ëëì ìëì ëëë ëëì ëëì Alpha ì êìíì ììëììëë.
+
+ìì "ìì ìêì" ìëììì ìêíìì.
+
+
+êì ëì êìí
+----------------
+
+êì ëììì ëìíë êìíëì êìí ììë SMP ìì ìì ìíì ëìë
+íë SMP ìíì ëì ì ììëë. ìê UP ìëì ììíëì SMP íìíì
+êëëì ëìíë ëìììëë. ì êììë mandatory ëëìë ììíì ëìë
+íêí ì ìêìë êë íêì ëëëì êì ììì íêìì ìëëë.
+
+ì ëìë ìëíê íêíê ìí, ëì ëëì virt_mb() ëì ëíëë ììí ì
+ììëë. ìêëì SMP ê íìí ëì ìëë smp_mb() ëê ëìí íêë
+êìëëë, SMP ì SMP ìë ììí ëëì ëí ëìí ìëë ëëìëëë.
+ìë ëì, êì ëì êìíëì (SMP ì ì ìë) íìíì ëêíë í ëìë
+smp_mb() ê ìëë virt_mb() ë ììíì íëë.
+
+ìêëì smp_mb() ëì êëê ëë ëëìì ëìíë, íí, MMIO ì ìíì
+ëíìë êìíì ììëë: MMIO ì ìíì ììíëë, mandatory ëëìë
+ììíìê ëëëë.
+
+
+=======
+ìì ì
+=======
+
+ìíì ëí
+-----------
+
+ëëë ëëìë ìíì ëíë ììì(producer)ì ìëì(consumer) ììì
+ëêíì ëì ììíì ìê êííëëì ììë ì ììëë. ë ììí ëìì
+ìíì ëìì ìêíìì:
+
+ Documentation/circular-buffers.txt
+
+
+=========
+ìê ëí
+=========
+
+Alpha AXP Architecture Reference Manual, Second Edition (Sites & Witek,
+Digital Press)
+ Chapter 5.2: Physical Address Space Characteristics
+ Chapter 5.4: Caches and Write Buffers
+ Chapter 5.5: Data Sharing
+ Chapter 5.6: Read/Write Ordering
+
+AMD64 Architecture Programmer's Manual Volume 2: System Programming
+ Chapter 7.1: Memory-Access Ordering
+ Chapter 7.4: Buffering and Combining Memory Writes
+
+IA-32 Intel Architecture Software Developer's Manual, Volume 3:
+System Programming Guide
+ Chapter 7.1: Locked Atomic Operations
+ Chapter 7.2: Memory Ordering
+ Chapter 7.4: Serializing Instructions
+
+The SPARC Architecture Manual, Version 9
+ Chapter 8: Memory Models
+ Appendix D: Formal Specification of the Memory Models
+ Appendix J: Programming with the Memory Models
+
+UltraSPARC Programmer Reference Manual
+ Chapter 5: Memory Accesses and Cacheability
+ Chapter 15: Sparc-V9 Memory Models
+
+UltraSPARC III Cu User's Manual
+ Chapter 9: Memory Models
+
+UltraSPARC IIIi Processor User's Manual
+ Chapter 8: Memory Models
+
+UltraSPARC Architecture 2005
+ Chapter 9: Memory
+ Appendix D: Formal Specifications of the Memory Models
+
+UltraSPARC T1 Supplement to the UltraSPARC Architecture 2005
+ Chapter 8: Memory Models
+ Appendix F: Caches and Cache Coherency
+
+Solaris Internals, Core Kernel Architecture, p63-68:
+ Chapter 3.3: Hardware Considerations for Locks and
+ Synchronization
+
+Unix Systems for Modern Architectures, Symmetric Multiprocessing and Caching
+for Kernel Programmers:
+ Chapter 13: Other Memory Models
+
+Intel Itanium Architecture Software Developer's Manual: Volume 1:
+ Section 2.6: Speculation
+ Section 4.4: Memory Access
diff --git a/Documentation/locking/lglock.txt b/Documentation/locking/lglock.txt
deleted file mode 100644
index a6971e34fabe..000000000000
--- a/Documentation/locking/lglock.txt
+++ /dev/null
@@ -1,166 +0,0 @@
-lglock - local/global locks for mostly local access patterns
-------------------------------------------------------------
-
-Origin: Nick Piggin's VFS scalability series introduced during
- 2.6.35++ [1] [2]
-Location: kernel/locking/lglock.c
- include/linux/lglock.h
-Users: currently only the VFS and stop_machine related code
-
-Design Goal:
-------------
-
-Improve scalability of globally used large data sets that are
-distributed over all CPUs as per_cpu elements.
-
-To manage global data structures that are partitioned over all CPUs
-as per_cpu elements but can be mostly handled by CPU local actions
-lglock will be used where the majority of accesses are cpu local
-reading and occasional cpu local writing with very infrequent
-global write access.
-
-
-* deal with things locally whenever possible
- - very fast access to the local per_cpu data
- - reasonably fast access to specific per_cpu data on a different
- CPU
-* while making global action possible when needed
- - by expensive access to all CPUs locks - effectively
- resulting in a globally visible critical section.
-
-Design:
--------
-
-Basically it is an array of per_cpu spinlocks with the
-lg_local_lock/unlock accessing the local CPUs lock object and the
-lg_local_lock_cpu/unlock_cpu accessing a remote CPUs lock object
-the lg_local_lock has to disable preemption as migration protection so
-that the reference to the local CPUs lock does not go out of scope.
-Due to the lg_local_lock/unlock only touching cpu-local resources it
-is fast. Taking the local lock on a different CPU will be more
-expensive but still relatively cheap.
-
-One can relax the migration constraints by acquiring the current
-CPUs lock with lg_local_lock_cpu, remember the cpu, and release that
-lock at the end of the critical section even if migrated. This should
-give most of the performance benefits without inhibiting migration
-though needs careful considerations for nesting of lglocks and
-consideration of deadlocks with lg_global_lock.
-
-The lg_global_lock/unlock locks all underlying spinlocks of all
-possible CPUs (including those off-line). The preemption disable/enable
-are needed in the non-RT kernels to prevent deadlocks like:
-
- on cpu 1
-
- task A task B
- lg_global_lock
- got cpu 0 lock
- <<<< preempt <<<<
- lg_local_lock_cpu for cpu 0
- spin on cpu 0 lock
-
-On -RT this deadlock scenario is resolved by the arch_spin_locks in the
-lglocks being replaced by rt_mutexes which resolve the above deadlock
-by boosting the lock-holder.
-
-
-Implementation:
----------------
-
-The initial lglock implementation from Nick Piggin used some complex
-macros to generate the lglock/brlock in lglock.h - they were later
-turned into a set of functions by Andi Kleen [7]. The change to functions
-was motivated by the presence of multiple lock users and also by them
-being easier to maintain than the generating macros. This change to
-functions is also the basis to eliminated the restriction of not
-being initializeable in kernel modules (the remaining problem is that
-locks are not explicitly initialized - see lockdep-design.txt)
-
-Declaration and initialization:
--------------------------------
-
- #include <linux/lglock.h>
-
- DEFINE_LGLOCK(name)
- or:
- DEFINE_STATIC_LGLOCK(name);
-
- lg_lock_init(&name, "lockdep_name_string");
-
- on UP this is mapped to DEFINE_SPINLOCK(name) in both cases, note
- also that as of 3.18-rc6 all declaration in use are of the _STATIC_
- variant (and it seems that the non-static was never in use).
- lg_lock_init is initializing the lockdep map only.
-
-Usage:
-------
-
-From the locking semantics it is a spinlock. It could be called a
-locality aware spinlock. lg_local_* behaves like a per_cpu
-spinlock and lg_global_* like a global spinlock.
-No surprises in the API.
-
- lg_local_lock(*lglock);
- access to protected per_cpu object on this CPU
- lg_local_unlock(*lglock);
-
- lg_local_lock_cpu(*lglock, cpu);
- access to protected per_cpu object on other CPU cpu
- lg_local_unlock_cpu(*lglock, cpu);
-
- lg_global_lock(*lglock);
- access all protected per_cpu objects on all CPUs
- lg_global_unlock(*lglock);
-
- There are no _trylock variants of the lglocks.
-
-Note that the lg_global_lock/unlock has to iterate over all possible
-CPUs rather than the actually present CPUs or a CPU could go off-line
-with a held lock [4] and that makes it very expensive. A discussion on
-these issues can be found at [5]
-
-Constraints:
-------------
-
- * currently the declaration of lglocks in kernel modules is not
- possible, though this should be doable with little change.
- * lglocks are not recursive.
- * suitable for code that can do most operations on the CPU local
- data and will very rarely need the global lock
- * lg_global_lock/unlock is *very* expensive and does not scale
- * on UP systems all lg_* primitives are simply spinlocks
- * in PREEMPT_RT the spinlock becomes an rt-mutex and can sleep but
- does not change the tasks state while sleeping [6].
- * in PREEMPT_RT the preempt_disable/enable in lg_local_lock/unlock
- is downgraded to a migrate_disable/enable, the other
- preempt_disable/enable are downgraded to barriers [6].
- The deadlock noted for non-RT above is resolved due to rt_mutexes
- boosting the lock-holder in this case which arch_spin_locks do
- not do.
-
-lglocks were designed for very specific problems in the VFS and probably
-only are the right answer in these corner cases. Any new user that looks
-at lglocks probably wants to look at the seqlock and RCU alternatives as
-her first choice. There are also efforts to resolve the RCU issues that
-currently prevent using RCU in place of view remaining lglocks.
-
-Note on brlock history:
------------------------
-
-The 'Big Reader' read-write spinlocks were originally introduced by
-Ingo Molnar in 2000 (2.4/2.5 kernel series) and removed in 2003. They
-later were introduced by the VFS scalability patch set in 2.6 series
-again as the "big reader lock" brlock [2] variant of lglock which has
-been replaced by seqlock primitives or by RCU based primitives in the
-3.13 kernel series as was suggested in [3] in 2003. The brlock was
-entirely removed in the 3.13 kernel series.
-
-Link: 1 http://lkml.org/lkml/2010/8/2/81
-Link: 2 http://lwn.net/Articles/401738/
-Link: 3 http://lkml.org/lkml/2003/3/9/205
-Link: 4 https://lkml.org/lkml/2011/8/24/185
-Link: 5 http://lkml.org/lkml/2011/12/18/189
-Link: 6 https://www.kernel.org/pub/linux/kernel/projects/rt/
- patch series - lglocks-rt.patch.patch
-Link: 7 http://lkml.org/lkml/2012/3/5/26
diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index a4d0a99de04d..ba818ecce6f9 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -609,7 +609,7 @@ third possibility from arising.
The data-dependency barrier must order the read into Q with the store
into *Q. This prohibits this outcome:

- (Q == B) && (B == 4)
+ (Q == &B) && (B == 4)

Please note that this pattern should be rare. After all, the whole point
of dependency ordering is to -prevent- writes to the data structure, along
@@ -1928,6 +1928,7 @@ compiler and the CPU from reordering them.

See Documentation/DMA-API.txt for more information on consistent memory.

+
MMIO WRITE BARRIER
------------------

@@ -2075,7 +2076,7 @@ systems, and so cannot be counted on in such a situation to actually achieve
anything at all - especially with respect to I/O accesses - unless combined
with interrupt disabling operations.

-See also the section on "Inter-CPU locking barrier effects".
+See also the section on "Inter-CPU acquiring barrier effects".


As an example, consider the following:
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 2a1f0ce7c59a..0cc8811af4e0 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -705,7 +705,6 @@ config PARAVIRT_DEBUG
config PARAVIRT_SPINLOCKS
bool "Paravirtualization layer for spinlocks"
depends on PARAVIRT && SMP
- select UNINLINE_SPIN_UNLOCK if !QUEUED_SPINLOCKS
---help---
Paravirtualized spinlocks allow a pvops backend to replace the
spinlock implementation with something virtualization-friendly
@@ -718,7 +717,7 @@ config PARAVIRT_SPINLOCKS

config QUEUED_LOCK_STAT
bool "Paravirt queued spinlock statistics"
- depends on PARAVIRT_SPINLOCKS && DEBUG_FS && QUEUED_SPINLOCKS
+ depends on PARAVIRT_SPINLOCKS && DEBUG_FS
---help---
Enable the collection of statistical data on the slowpath
behavior of paravirtualized queued spinlocks and report
diff --git a/arch/x86/include/asm/cmpxchg.h b/arch/x86/include/asm/cmpxchg.h
index 9733361fed6f..97848cdfcb1a 100644
--- a/arch/x86/include/asm/cmpxchg.h
+++ b/arch/x86/include/asm/cmpxchg.h
@@ -158,53 +158,9 @@ extern void __add_wrong_size(void)
* value of "*ptr".
*
* xadd() is locked when multiple CPUs are online
- * xadd_sync() is always locked
- * xadd_local() is never locked
*/
#define __xadd(ptr, inc, lock) __xchg_op((ptr), (inc), xadd, lock)
#define xadd(ptr, inc) __xadd((ptr), (inc), LOCK_PREFIX)
-#define xadd_sync(ptr, inc) __xadd((ptr), (inc), "lock; ")
-#define xadd_local(ptr, inc) __xadd((ptr), (inc), "")
-
-#define __add(ptr, inc, lock) \
- ({ \
- __typeof__ (*(ptr)) __ret = (inc); \
- switch (sizeof(*(ptr))) { \
- case __X86_CASE_B: \
- asm volatile (lock "addb %b1, %0\n" \
- : "+m" (*(ptr)) : "qi" (inc) \
- : "memory", "cc"); \
- break; \
- case __X86_CASE_W: \
- asm volatile (lock "addw %w1, %0\n" \
- : "+m" (*(ptr)) : "ri" (inc) \
- : "memory", "cc"); \
- break; \
- case __X86_CASE_L: \
- asm volatile (lock "addl %1, %0\n" \
- : "+m" (*(ptr)) : "ri" (inc) \
- : "memory", "cc"); \
- break; \
- case __X86_CASE_Q: \
- asm volatile (lock "addq %1, %0\n" \
- : "+m" (*(ptr)) : "ri" (inc) \
- : "memory", "cc"); \
- break; \
- default: \
- __add_wrong_size(); \
- } \
- __ret; \
- })
-
-/*
- * add_*() adds "inc" to "*ptr"
- *
- * __add() takes a lock prefix
- * add_smp() is locked when multiple CPUs are online
- * add_sync() is always locked
- */
-#define add_smp(ptr, inc) __add((ptr), (inc), LOCK_PREFIX)
-#define add_sync(ptr, inc) __add((ptr), (inc), "lock; ")

#define __cmpxchg_double(pfx, p1, p2, o1, o2, n1, n2) \
({ \
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 2970d22d7766..4cd8db05301f 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -661,8 +661,6 @@ static inline void __set_fixmap(unsigned /* enum fixed_addresses */ idx,

#if defined(CONFIG_SMP) && defined(CONFIG_PARAVIRT_SPINLOCKS)

-#ifdef CONFIG_QUEUED_SPINLOCKS
-
static __always_inline void pv_queued_spin_lock_slowpath(struct qspinlock *lock,
u32 val)
{
@@ -684,22 +682,6 @@ static __always_inline void pv_kick(int cpu)
PVOP_VCALL1(pv_lock_ops.kick, cpu);
}

-#else /* !CONFIG_QUEUED_SPINLOCKS */
-
-static __always_inline void __ticket_lock_spinning(struct arch_spinlock *lock,
- __ticket_t ticket)
-{
- PVOP_VCALLEE2(pv_lock_ops.lock_spinning, lock, ticket);
-}
-
-static __always_inline void __ticket_unlock_kick(struct arch_spinlock *lock,
- __ticket_t ticket)
-{
- PVOP_VCALL2(pv_lock_ops.unlock_kick, lock, ticket);
-}
-
-#endif /* CONFIG_QUEUED_SPINLOCKS */
-
#endif /* SMP && PARAVIRT_SPINLOCKS */

#ifdef CONFIG_X86_32
diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index 7fa9e7740ba3..60aac60ba25f 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -301,23 +301,16 @@ struct pv_mmu_ops {
struct arch_spinlock;
#ifdef CONFIG_SMP
#include <asm/spinlock_types.h>
-#else
-typedef u16 __ticket_t;
#endif

struct qspinlock;

struct pv_lock_ops {
-#ifdef CONFIG_QUEUED_SPINLOCKS
void (*queued_spin_lock_slowpath)(struct qspinlock *lock, u32 val);
struct paravirt_callee_save queued_spin_unlock;

void (*wait)(u8 *ptr, u8 val);
void (*kick)(int cpu);
-#else /* !CONFIG_QUEUED_SPINLOCKS */
- struct paravirt_callee_save lock_spinning;
- void (*unlock_kick)(struct arch_spinlock *lock, __ticket_t ticket);
-#endif /* !CONFIG_QUEUED_SPINLOCKS */
};

/* This contains all the paravirt structures: we get a convenient
diff --git a/arch/x86/include/asm/rwsem.h b/arch/x86/include/asm/rwsem.h
index 8dbc762ad132..3d33a719f5c1 100644
--- a/arch/x86/include/asm/rwsem.h
+++ b/arch/x86/include/asm/rwsem.h
@@ -154,7 +154,7 @@ static inline bool __down_write_trylock(struct rw_semaphore *sem)
: "+m" (sem->count), "=&a" (tmp0), "=&r" (tmp1),
CC_OUT(e) (result)
: "er" (RWSEM_ACTIVE_WRITE_BIAS)
- : "memory", "cc");
+ : "memory");
return result;
}

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index be0a05913b91..921bea7a2708 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -20,187 +20,13 @@
* (the type definitions are in asm/spinlock_types.h)
*/

-#ifdef CONFIG_X86_32
-# define LOCK_PTR_REG "a"
-#else
-# define LOCK_PTR_REG "D"
-#endif
-
-#if defined(CONFIG_X86_32) && (defined(CONFIG_X86_PPRO_FENCE))
-/*
- * On PPro SMP, we use a locked operation to unlock
- * (PPro errata 66, 92)
- */
-# define UNLOCK_LOCK_PREFIX LOCK_PREFIX
-#else
-# define UNLOCK_LOCK_PREFIX
-#endif
-
/* How long a lock should spin before we consider blocking */
#define SPIN_THRESHOLD (1 << 15)

extern struct static_key paravirt_ticketlocks_enabled;
static __always_inline bool static_key_false(struct static_key *key);

-#ifdef CONFIG_QUEUED_SPINLOCKS
#include <asm/qspinlock.h>
-#else
-
-#ifdef CONFIG_PARAVIRT_SPINLOCKS
-
-static inline void __ticket_enter_slowpath(arch_spinlock_t *lock)
-{
- set_bit(0, (volatile unsigned long *)&lock->tickets.head);
-}
-
-#else /* !CONFIG_PARAVIRT_SPINLOCKS */
-static __always_inline void __ticket_lock_spinning(arch_spinlock_t *lock,
- __ticket_t ticket)
-{
-}
-static inline void __ticket_unlock_kick(arch_spinlock_t *lock,
- __ticket_t ticket)
-{
-}
-
-#endif /* CONFIG_PARAVIRT_SPINLOCKS */
-static inline int __tickets_equal(__ticket_t one, __ticket_t two)
-{
- return !((one ^ two) & ~TICKET_SLOWPATH_FLAG);
-}
-
-static inline void __ticket_check_and_clear_slowpath(arch_spinlock_t *lock,
- __ticket_t head)
-{
- if (head & TICKET_SLOWPATH_FLAG) {
- arch_spinlock_t old, new;
-
- old.tickets.head = head;
- new.tickets.head = head & ~TICKET_SLOWPATH_FLAG;
- old.tickets.tail = new.tickets.head + TICKET_LOCK_INC;
- new.tickets.tail = old.tickets.tail;
-
- /* try to clear slowpath flag when there are no contenders */
- cmpxchg(&lock->head_tail, old.head_tail, new.head_tail);
- }
-}
-
-static __always_inline int arch_spin_value_unlocked(arch_spinlock_t lock)
-{
- return __tickets_equal(lock.tickets.head, lock.tickets.tail);
-}
-
-/*
- * Ticket locks are conceptually two parts, one indicating the current head of
- * the queue, and the other indicating the current tail. The lock is acquired
- * by atomically noting the tail and incrementing it by one (thus adding
- * ourself to the queue and noting our position), then waiting until the head
- * becomes equal to the the initial value of the tail.
- *
- * We use an xadd covering *both* parts of the lock, to increment the tail and
- * also load the position of the head, which takes care of memory ordering
- * issues and should be optimal for the uncontended case. Note the tail must be
- * in the high part, because a wide xadd increment of the low part would carry
- * up and contaminate the high part.
- */
-static __always_inline void arch_spin_lock(arch_spinlock_t *lock)
-{
- register struct __raw_tickets inc = { .tail = TICKET_LOCK_INC };
-
- inc = xadd(&lock->tickets, inc);
- if (likely(inc.head == inc.tail))
- goto out;
-
- for (;;) {
- unsigned count = SPIN_THRESHOLD;
-
- do {
- inc.head = READ_ONCE(lock->tickets.head);
- if (__tickets_equal(inc.head, inc.tail))
- goto clear_slowpath;
- cpu_relax();
- } while (--count);
- __ticket_lock_spinning(lock, inc.tail);
- }
-clear_slowpath:
- __ticket_check_and_clear_slowpath(lock, inc.head);
-out:
- barrier(); /* make sure nothing creeps before the lock is taken */
-}
-
-static __always_inline int arch_spin_trylock(arch_spinlock_t *lock)
-{
- arch_spinlock_t old, new;
-
- old.tickets = READ_ONCE(lock->tickets);
- if (!__tickets_equal(old.tickets.head, old.tickets.tail))
- return 0;
-
- new.head_tail = old.head_tail + (TICKET_LOCK_INC << TICKET_SHIFT);
- new.head_tail &= ~TICKET_SLOWPATH_FLAG;
-
- /* cmpxchg is a full barrier, so nothing can move before it */
- return cmpxchg(&lock->head_tail, old.head_tail, new.head_tail) == old.head_tail;
-}
-
-static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)
-{
- if (TICKET_SLOWPATH_FLAG &&
- static_key_false(&paravirt_ticketlocks_enabled)) {
- __ticket_t head;
-
- BUILD_BUG_ON(((__ticket_t)NR_CPUS) != NR_CPUS);
-
- head = xadd(&lock->tickets.head, TICKET_LOCK_INC);
-
- if (unlikely(head & TICKET_SLOWPATH_FLAG)) {
- head &= ~TICKET_SLOWPATH_FLAG;
- __ticket_unlock_kick(lock, (head + TICKET_LOCK_INC));
- }
- } else
- __add(&lock->tickets.head, TICKET_LOCK_INC, UNLOCK_LOCK_PREFIX);
-}
-
-static inline int arch_spin_is_locked(arch_spinlock_t *lock)
-{
- struct __raw_tickets tmp = READ_ONCE(lock->tickets);
-
- return !__tickets_equal(tmp.tail, tmp.head);
-}
-
-static inline int arch_spin_is_contended(arch_spinlock_t *lock)
-{
- struct __raw_tickets tmp = READ_ONCE(lock->tickets);
-
- tmp.head &= ~TICKET_SLOWPATH_FLAG;
- return (__ticket_t)(tmp.tail - tmp.head) > TICKET_LOCK_INC;
-}
-#define arch_spin_is_contended arch_spin_is_contended
-
-static __always_inline void arch_spin_lock_flags(arch_spinlock_t *lock,
- unsigned long flags)
-{
- arch_spin_lock(lock);
-}
-
-static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
-{
- __ticket_t head = READ_ONCE(lock->tickets.head);
-
- for (;;) {
- struct __raw_tickets tmp = READ_ONCE(lock->tickets);
- /*
- * We need to check "unlocked" in a loop, tmp.head == head
- * can be false positive because of overflow.
- */
- if (__tickets_equal(tmp.head, tmp.tail) ||
- !__tickets_equal(tmp.head, head))
- break;
-
- cpu_relax();
- }
-}
-#endif /* CONFIG_QUEUED_SPINLOCKS */

/*
* Read-write spinlocks, allowing multiple readers
diff --git a/arch/x86/include/asm/spinlock_types.h b/arch/x86/include/asm/spinlock_types.h
index 65c3e37f879a..25311ebb446c 100644
--- a/arch/x86/include/asm/spinlock_types.h
+++ b/arch/x86/include/asm/spinlock_types.h
@@ -23,20 +23,7 @@ typedef u32 __ticketpair_t;

#define TICKET_SHIFT (sizeof(__ticket_t) * 8)

-#ifdef CONFIG_QUEUED_SPINLOCKS
#include <asm-generic/qspinlock_types.h>
-#else
-typedef struct arch_spinlock {
- union {
- __ticketpair_t head_tail;
- struct __raw_tickets {
- __ticket_t head, tail;
- } tickets;
- };
-} arch_spinlock_t;
-
-#define __ARCH_SPIN_LOCK_UNLOCKED { { 0 } }
-#endif /* CONFIG_QUEUED_SPINLOCKS */

#include <asm-generic/qrwlock_types.h>

diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 1726c4c12336..865058d087ac 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -575,9 +575,6 @@ static void kvm_kick_cpu(int cpu)
kvm_hypercall2(KVM_HC_KICK_CPU, flags, apicid);
}

-
-#ifdef CONFIG_QUEUED_SPINLOCKS
-
#include <asm/qspinlock.h>

static void kvm_wait(u8 *ptr, u8 val)
@@ -606,243 +603,6 @@ static void kvm_wait(u8 *ptr, u8 val)
local_irq_restore(flags);
}

-#else /* !CONFIG_QUEUED_SPINLOCKS */
-
-enum kvm_contention_stat {
- TAKEN_SLOW,
- TAKEN_SLOW_PICKUP,
- RELEASED_SLOW,
- RELEASED_SLOW_KICKED,
- NR_CONTENTION_STATS
-};
-
-#ifdef CONFIG_KVM_DEBUG_FS
-#define HISTO_BUCKETS 30
-
-static struct kvm_spinlock_stats
-{
- u32 contention_stats[NR_CONTENTION_STATS];
- u32 histo_spin_blocked[HISTO_BUCKETS+1];
- u64 time_blocked;
-} spinlock_stats;
-
-static u8 zero_stats;
-
-static inline void check_zero(void)
-{
- u8 ret;
- u8 old;
-
- old = READ_ONCE(zero_stats);
- if (unlikely(old)) {
- ret = cmpxchg(&zero_stats, old, 0);
- /* This ensures only one fellow resets the stat */
- if (ret == old)
- memset(&spinlock_stats, 0, sizeof(spinlock_stats));
- }
-}
-
-static inline void add_stats(enum kvm_contention_stat var, u32 val)
-{
- check_zero();
- spinlock_stats.contention_stats[var] += val;
-}
-
-
-static inline u64 spin_time_start(void)
-{
- return sched_clock();
-}
-
-static void __spin_time_accum(u64 delta, u32 *array)
-{
- unsigned index;
-
- index = ilog2(delta);
- check_zero();
-
- if (index < HISTO_BUCKETS)
- array[index]++;
- else
- array[HISTO_BUCKETS]++;
-}
-
-static inline void spin_time_accum_blocked(u64 start)
-{
- u32 delta;
-
- delta = sched_clock() - start;
- __spin_time_accum(delta, spinlock_stats.histo_spin_blocked);
- spinlock_stats.time_blocked += delta;
-}
-
-static struct dentry *d_spin_debug;
-static struct dentry *d_kvm_debug;
-
-static struct dentry *kvm_init_debugfs(void)
-{
- d_kvm_debug = debugfs_create_dir("kvm-guest", NULL);
- if (!d_kvm_debug)
- printk(KERN_WARNING "Could not create 'kvm' debugfs directory\n");
-
- return d_kvm_debug;
-}
-
-static int __init kvm_spinlock_debugfs(void)
-{
- struct dentry *d_kvm;
-
- d_kvm = kvm_init_debugfs();
- if (d_kvm == NULL)
- return -ENOMEM;
-
- d_spin_debug = debugfs_create_dir("spinlocks", d_kvm);
-
- debugfs_create_u8("zero_stats", 0644, d_spin_debug, &zero_stats);
-
- debugfs_create_u32("taken_slow", 0444, d_spin_debug,
- &spinlock_stats.contention_stats[TAKEN_SLOW]);
- debugfs_create_u32("taken_slow_pickup", 0444, d_spin_debug,
- &spinlock_stats.contention_stats[TAKEN_SLOW_PICKUP]);
-
- debugfs_create_u32("released_slow", 0444, d_spin_debug,
- &spinlock_stats.contention_stats[RELEASED_SLOW]);
- debugfs_create_u32("released_slow_kicked", 0444, d_spin_debug,
- &spinlock_stats.contention_stats[RELEASED_SLOW_KICKED]);
-
- debugfs_create_u64("time_blocked", 0444, d_spin_debug,
- &spinlock_stats.time_blocked);
-
- debugfs_create_u32_array("histo_blocked", 0444, d_spin_debug,
- spinlock_stats.histo_spin_blocked, HISTO_BUCKETS + 1);
-
- return 0;
-}
-fs_initcall(kvm_spinlock_debugfs);
-#else /* !CONFIG_KVM_DEBUG_FS */
-static inline void add_stats(enum kvm_contention_stat var, u32 val)
-{
-}
-
-static inline u64 spin_time_start(void)
-{
- return 0;
-}
-
-static inline void spin_time_accum_blocked(u64 start)
-{
-}
-#endif /* CONFIG_KVM_DEBUG_FS */
-
-struct kvm_lock_waiting {
- struct arch_spinlock *lock;
- __ticket_t want;
-};
-
-/* cpus 'waiting' on a spinlock to become available */
-static cpumask_t waiting_cpus;
-
-/* Track spinlock on which a cpu is waiting */
-static DEFINE_PER_CPU(struct kvm_lock_waiting, klock_waiting);
-
-__visible void kvm_lock_spinning(struct arch_spinlock *lock, __ticket_t want)
-{
- struct kvm_lock_waiting *w;
- int cpu;
- u64 start;
- unsigned long flags;
- __ticket_t head;
-
- if (in_nmi())
- return;
-
- w = this_cpu_ptr(&klock_waiting);
- cpu = smp_processor_id();
- start = spin_time_start();
-
- /*
- * Make sure an interrupt handler can't upset things in a
- * partially setup state.
- */
- local_irq_save(flags);
-
- /*
- * The ordering protocol on this is that the "lock" pointer
- * may only be set non-NULL if the "want" ticket is correct.
- * If we're updating "want", we must first clear "lock".
- */
- w->lock = NULL;
- smp_wmb();
- w->want = want;
- smp_wmb();
- w->lock = lock;
-
- add_stats(TAKEN_SLOW, 1);
-
- /*
- * This uses set_bit, which is atomic but we should not rely on its
- * reordering gurantees. So barrier is needed after this call.
- */
- cpumask_set_cpu(cpu, &waiting_cpus);
-
- barrier();
-
- /*
- * Mark entry to slowpath before doing the pickup test to make
- * sure we don't deadlock with an unlocker.
- */
- __ticket_enter_slowpath(lock);
-
- /* make sure enter_slowpath, which is atomic does not cross the read */
- smp_mb__after_atomic();
-
- /*
- * check again make sure it didn't become free while
- * we weren't looking.
- */
- head = READ_ONCE(lock->tickets.head);
- if (__tickets_equal(head, want)) {
- add_stats(TAKEN_SLOW_PICKUP, 1);
- goto out;
- }
-
- /*
- * halt until it's our turn and kicked. Note that we do safe halt
- * for irq enabled case to avoid hang when lock info is overwritten
- * in irq spinlock slowpath and no spurious interrupt occur to save us.
- */
- if (arch_irqs_disabled_flags(flags))
- halt();
- else
- safe_halt();
-
-out:
- cpumask_clear_cpu(cpu, &waiting_cpus);
- w->lock = NULL;
- local_irq_restore(flags);
- spin_time_accum_blocked(start);
-}
-PV_CALLEE_SAVE_REGS_THUNK(kvm_lock_spinning);
-
-/* Kick vcpu waiting on @lock->head to reach value @ticket */
-static void kvm_unlock_kick(struct arch_spinlock *lock, __ticket_t ticket)
-{
- int cpu;
-
- add_stats(RELEASED_SLOW, 1);
- for_each_cpu(cpu, &waiting_cpus) {
- const struct kvm_lock_waiting *w = &per_cpu(klock_waiting, cpu);
- if (READ_ONCE(w->lock) == lock &&
- READ_ONCE(w->want) == ticket) {
- add_stats(RELEASED_SLOW_KICKED, 1);
- kvm_kick_cpu(cpu);
- break;
- }
- }
-}
-
-#endif /* !CONFIG_QUEUED_SPINLOCKS */
-
/*
* Setup pv_lock_ops to exploit KVM_FEATURE_PV_UNHALT if present.
*/
@@ -854,16 +614,11 @@ void __init kvm_spinlock_init(void)
if (!kvm_para_has_feature(KVM_FEATURE_PV_UNHALT))
return;

-#ifdef CONFIG_QUEUED_SPINLOCKS
__pv_init_lock_hash();
pv_lock_ops.queued_spin_lock_slowpath = __pv_queued_spin_lock_slowpath;
pv_lock_ops.queued_spin_unlock = PV_CALLEE_SAVE(__pv_queued_spin_unlock);
pv_lock_ops.wait = kvm_wait;
pv_lock_ops.kick = kvm_kick_cpu;
-#else /* !CONFIG_QUEUED_SPINLOCKS */
- pv_lock_ops.lock_spinning = PV_CALLEE_SAVE(kvm_lock_spinning);
- pv_lock_ops.unlock_kick = kvm_unlock_kick;
-#endif
}

static __init int kvm_spinlock_init_jump(void)
diff --git a/arch/x86/kernel/paravirt-spinlocks.c b/arch/x86/kernel/paravirt-spinlocks.c
index 1939a0269377..2c55a003b793 100644
--- a/arch/x86/kernel/paravirt-spinlocks.c
+++ b/arch/x86/kernel/paravirt-spinlocks.c
@@ -8,7 +8,6 @@

#include <asm/paravirt.h>

-#ifdef CONFIG_QUEUED_SPINLOCKS
__visible void __native_queued_spin_unlock(struct qspinlock *lock)
{
native_queued_spin_unlock(lock);
@@ -21,19 +20,13 @@ bool pv_is_native_spin_unlock(void)
return pv_lock_ops.queued_spin_unlock.func ==
__raw_callee_save___native_queued_spin_unlock;
}
-#endif

struct pv_lock_ops pv_lock_ops = {
#ifdef CONFIG_SMP
-#ifdef CONFIG_QUEUED_SPINLOCKS
.queued_spin_lock_slowpath = native_queued_spin_lock_slowpath,
.queued_spin_unlock = PV_CALLEE_SAVE(__native_queued_spin_unlock),
.wait = paravirt_nop,
.kick = paravirt_nop,
-#else /* !CONFIG_QUEUED_SPINLOCKS */
- .lock_spinning = __PV_IS_CALLEE_SAVE(paravirt_nop),
- .unlock_kick = paravirt_nop,
-#endif /* !CONFIG_QUEUED_SPINLOCKS */
#endif /* SMP */
};
EXPORT_SYMBOL(pv_lock_ops);
diff --git a/arch/x86/kernel/paravirt_patch_32.c b/arch/x86/kernel/paravirt_patch_32.c
index 158dc0650d5d..920c6ae08592 100644
--- a/arch/x86/kernel/paravirt_patch_32.c
+++ b/arch/x86/kernel/paravirt_patch_32.c
@@ -10,7 +10,7 @@ DEF_NATIVE(pv_mmu_ops, write_cr3, "mov %eax, %cr3");
DEF_NATIVE(pv_mmu_ops, read_cr3, "mov %cr3, %eax");
DEF_NATIVE(pv_cpu_ops, clts, "clts");

-#if defined(CONFIG_PARAVIRT_SPINLOCKS) && defined(CONFIG_QUEUED_SPINLOCKS)
+#if defined(CONFIG_PARAVIRT_SPINLOCKS)
DEF_NATIVE(pv_lock_ops, queued_spin_unlock, "movb $0, (%eax)");
#endif

@@ -49,7 +49,7 @@ unsigned native_patch(u8 type, u16 clobbers, void *ibuf,
PATCH_SITE(pv_mmu_ops, read_cr3);
PATCH_SITE(pv_mmu_ops, write_cr3);
PATCH_SITE(pv_cpu_ops, clts);
-#if defined(CONFIG_PARAVIRT_SPINLOCKS) && defined(CONFIG_QUEUED_SPINLOCKS)
+#if defined(CONFIG_PARAVIRT_SPINLOCKS)
case PARAVIRT_PATCH(pv_lock_ops.queued_spin_unlock):
if (pv_is_native_spin_unlock()) {
start = start_pv_lock_ops_queued_spin_unlock;
diff --git a/arch/x86/kernel/paravirt_patch_64.c b/arch/x86/kernel/paravirt_patch_64.c
index e70087a04cc8..bb3840cedb4f 100644
--- a/arch/x86/kernel/paravirt_patch_64.c
+++ b/arch/x86/kernel/paravirt_patch_64.c
@@ -19,7 +19,7 @@ DEF_NATIVE(pv_cpu_ops, swapgs, "swapgs");
DEF_NATIVE(, mov32, "mov %edi, %eax");
DEF_NATIVE(, mov64, "mov %rdi, %rax");

-#if defined(CONFIG_PARAVIRT_SPINLOCKS) && defined(CONFIG_QUEUED_SPINLOCKS)
+#if defined(CONFIG_PARAVIRT_SPINLOCKS)
DEF_NATIVE(pv_lock_ops, queued_spin_unlock, "movb $0, (%rdi)");
#endif

@@ -61,7 +61,7 @@ unsigned native_patch(u8 type, u16 clobbers, void *ibuf,
PATCH_SITE(pv_cpu_ops, clts);
PATCH_SITE(pv_mmu_ops, flush_tlb_single);
PATCH_SITE(pv_cpu_ops, wbinvd);
-#if defined(CONFIG_PARAVIRT_SPINLOCKS) && defined(CONFIG_QUEUED_SPINLOCKS)
+#if defined(CONFIG_PARAVIRT_SPINLOCKS)
case PARAVIRT_PATCH(pv_lock_ops.queued_spin_unlock):
if (pv_is_native_spin_unlock()) {
start = start_pv_lock_ops_queued_spin_unlock;
diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index f42e78de1e10..3d6e0064cbfc 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -21,8 +21,6 @@ static DEFINE_PER_CPU(int, lock_kicker_irq) = -1;
static DEFINE_PER_CPU(char *, irq_name);
static bool xen_pvspin = true;

-#ifdef CONFIG_QUEUED_SPINLOCKS
-
#include <asm/qspinlock.h>

static void xen_qlock_kick(int cpu)
@@ -71,207 +69,6 @@ static void xen_qlock_wait(u8 *byte, u8 val)
xen_poll_irq(irq);
}

-#else /* CONFIG_QUEUED_SPINLOCKS */
-
-enum xen_contention_stat {
- TAKEN_SLOW,
- TAKEN_SLOW_PICKUP,
- TAKEN_SLOW_SPURIOUS,
- RELEASED_SLOW,
- RELEASED_SLOW_KICKED,
- NR_CONTENTION_STATS
-};
-
-
-#ifdef CONFIG_XEN_DEBUG_FS
-#define HISTO_BUCKETS 30
-static struct xen_spinlock_stats
-{
- u32 contention_stats[NR_CONTENTION_STATS];
- u32 histo_spin_blocked[HISTO_BUCKETS+1];
- u64 time_blocked;
-} spinlock_stats;
-
-static u8 zero_stats;
-
-static inline void check_zero(void)
-{
- u8 ret;
- u8 old = READ_ONCE(zero_stats);
- if (unlikely(old)) {
- ret = cmpxchg(&zero_stats, old, 0);
- /* This ensures only one fellow resets the stat */
- if (ret == old)
- memset(&spinlock_stats, 0, sizeof(spinlock_stats));
- }
-}
-
-static inline void add_stats(enum xen_contention_stat var, u32 val)
-{
- check_zero();
- spinlock_stats.contention_stats[var] += val;
-}
-
-static inline u64 spin_time_start(void)
-{
- return xen_clocksource_read();
-}
-
-static void __spin_time_accum(u64 delta, u32 *array)
-{
- unsigned index = ilog2(delta);
-
- check_zero();
-
- if (index < HISTO_BUCKETS)
- array[index]++;
- else
- array[HISTO_BUCKETS]++;
-}
-
-static inline void spin_time_accum_blocked(u64 start)
-{
- u32 delta = xen_clocksource_read() - start;
-
- __spin_time_accum(delta, spinlock_stats.histo_spin_blocked);
- spinlock_stats.time_blocked += delta;
-}
-#else /* !CONFIG_XEN_DEBUG_FS */
-static inline void add_stats(enum xen_contention_stat var, u32 val)
-{
-}
-
-static inline u64 spin_time_start(void)
-{
- return 0;
-}
-
-static inline void spin_time_accum_blocked(u64 start)
-{
-}
-#endif /* CONFIG_XEN_DEBUG_FS */
-
-struct xen_lock_waiting {
- struct arch_spinlock *lock;
- __ticket_t want;
-};
-
-static DEFINE_PER_CPU(struct xen_lock_waiting, lock_waiting);
-static cpumask_t waiting_cpus;
-
-__visible void xen_lock_spinning(struct arch_spinlock *lock, __ticket_t want)
-{
- int irq = __this_cpu_read(lock_kicker_irq);
- struct xen_lock_waiting *w = this_cpu_ptr(&lock_waiting);
- int cpu = smp_processor_id();
- u64 start;
- __ticket_t head;
- unsigned long flags;
-
- /* If kicker interrupts not initialized yet, just spin */
- if (irq == -1)
- return;
-
- start = spin_time_start();
-
- /*
- * Make sure an interrupt handler can't upset things in a
- * partially setup state.
- */
- local_irq_save(flags);
- /*
- * We don't really care if we're overwriting some other
- * (lock,want) pair, as that would mean that we're currently
- * in an interrupt context, and the outer context had
- * interrupts enabled. That has already kicked the VCPU out
- * of xen_poll_irq(), so it will just return spuriously and
- * retry with newly setup (lock,want).
- *
- * The ordering protocol on this is that the "lock" pointer
- * may only be set non-NULL if the "want" ticket is correct.
- * If we're updating "want", we must first clear "lock".
- */
- w->lock = NULL;
- smp_wmb();
- w->want = want;
- smp_wmb();
- w->lock = lock;
-
- /* This uses set_bit, which atomic and therefore a barrier */
- cpumask_set_cpu(cpu, &waiting_cpus);
- add_stats(TAKEN_SLOW, 1);
-
- /* clear pending */
- xen_clear_irq_pending(irq);
-
- /* Only check lock once pending cleared */
- barrier();
-
- /*
- * Mark entry to slowpath before doing the pickup test to make
- * sure we don't deadlock with an unlocker.
- */
- __ticket_enter_slowpath(lock);
-
- /* make sure enter_slowpath, which is atomic does not cross the read */
- smp_mb__after_atomic();
-
- /*
- * check again make sure it didn't become free while
- * we weren't looking
- */
- head = READ_ONCE(lock->tickets.head);
- if (__tickets_equal(head, want)) {
- add_stats(TAKEN_SLOW_PICKUP, 1);
- goto out;
- }
-
- /* Allow interrupts while blocked */
- local_irq_restore(flags);
-
- /*
- * If an interrupt happens here, it will leave the wakeup irq
- * pending, which will cause xen_poll_irq() to return
- * immediately.
- */
-
- /* Block until irq becomes pending (or perhaps a spurious wakeup) */
- xen_poll_irq(irq);
- add_stats(TAKEN_SLOW_SPURIOUS, !xen_test_irq_pending(irq));
-
- local_irq_save(flags);
-
- kstat_incr_irq_this_cpu(irq);
-out:
- cpumask_clear_cpu(cpu, &waiting_cpus);
- w->lock = NULL;
-
- local_irq_restore(flags);
-
- spin_time_accum_blocked(start);
-}
-PV_CALLEE_SAVE_REGS_THUNK(xen_lock_spinning);
-
-static void xen_unlock_kick(struct arch_spinlock *lock, __ticket_t next)
-{
- int cpu;
-
- add_stats(RELEASED_SLOW, 1);
-
- for_each_cpu(cpu, &waiting_cpus) {
- const struct xen_lock_waiting *w = &per_cpu(lock_waiting, cpu);
-
- /* Make sure we read lock before want */
- if (READ_ONCE(w->lock) == lock &&
- READ_ONCE(w->want) == next) {
- add_stats(RELEASED_SLOW_KICKED, 1);
- xen_send_IPI_one(cpu, XEN_SPIN_UNLOCK_VECTOR);
- break;
- }
- }
-}
-#endif /* CONFIG_QUEUED_SPINLOCKS */
-
static irqreturn_t dummy_handler(int irq, void *dev_id)
{
BUG();
@@ -334,16 +131,12 @@ void __init xen_init_spinlocks(void)
return;
}
printk(KERN_DEBUG "xen: PV spinlocks enabled\n");
-#ifdef CONFIG_QUEUED_SPINLOCKS
+
__pv_init_lock_hash();
pv_lock_ops.queued_spin_lock_slowpath = __pv_queued_spin_lock_slowpath;
pv_lock_ops.queued_spin_unlock = PV_CALLEE_SAVE(__pv_queued_spin_unlock);
pv_lock_ops.wait = xen_qlock_wait;
pv_lock_ops.kick = xen_qlock_kick;
-#else
- pv_lock_ops.lock_spinning = PV_CALLEE_SAVE(xen_lock_spinning);
- pv_lock_ops.unlock_kick = xen_unlock_kick;
-#endif
}

/*
@@ -372,44 +165,3 @@ static __init int xen_parse_nopvspin(char *arg)
}
early_param("xen_nopvspin", xen_parse_nopvspin);

-#if defined(CONFIG_XEN_DEBUG_FS) && !defined(CONFIG_QUEUED_SPINLOCKS)
-
-static struct dentry *d_spin_debug;
-
-static int __init xen_spinlock_debugfs(void)
-{
- struct dentry *d_xen = xen_init_debugfs();
-
- if (d_xen == NULL)
- return -ENOMEM;
-
- if (!xen_pvspin)
- return 0;
-
- d_spin_debug = debugfs_create_dir("spinlocks", d_xen);
-
- debugfs_create_u8("zero_stats", 0644, d_spin_debug, &zero_stats);
-
- debugfs_create_u32("taken_slow", 0444, d_spin_debug,
- &spinlock_stats.contention_stats[TAKEN_SLOW]);
- debugfs_create_u32("taken_slow_pickup", 0444, d_spin_debug,
- &spinlock_stats.contention_stats[TAKEN_SLOW_PICKUP]);
- debugfs_create_u32("taken_slow_spurious", 0444, d_spin_debug,
- &spinlock_stats.contention_stats[TAKEN_SLOW_SPURIOUS]);
-
- debugfs_create_u32("released_slow", 0444, d_spin_debug,
- &spinlock_stats.contention_stats[RELEASED_SLOW]);
- debugfs_create_u32("released_slow_kicked", 0444, d_spin_debug,
- &spinlock_stats.contention_stats[RELEASED_SLOW_KICKED]);
-
- debugfs_create_u64("time_blocked", 0444, d_spin_debug,
- &spinlock_stats.time_blocked);
-
- debugfs_create_u32_array("histo_blocked", 0444, d_spin_debug,
- spinlock_stats.histo_spin_blocked, HISTO_BUCKETS + 1);
-
- return 0;
-}
-fs_initcall(xen_spinlock_debugfs);
-
-#endif /* CONFIG_XEN_DEBUG_FS */
diff --git a/fs/Kconfig b/fs/Kconfig
index 2bc7ad775842..3ef62bad8f2b 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -79,6 +79,7 @@ config EXPORTFS_BLOCK_OPS
config FILE_LOCKING
bool "Enable POSIX file locking API" if EXPERT
default y
+ select PERCPU_RWSEM
help
This option enables standard file locking support, required
for filesystems like NFS and for the flock() system
diff --git a/fs/locks.c b/fs/locks.c
index ee1b15f6fc13..133fb2543d21 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -127,7 +127,6 @@
#include <linux/pid_namespace.h>
#include <linux/hashtable.h>
#include <linux/percpu.h>
-#include <linux/lglock.h>

#define CREATE_TRACE_POINTS
#include <trace/events/filelock.h>
@@ -158,12 +157,18 @@ int lease_break_time = 45;

/*
* The global file_lock_list is only used for displaying /proc/locks, so we
- * keep a list on each CPU, with each list protected by its own spinlock via
- * the file_lock_lglock. Note that alterations to the list also require that
- * the relevant flc_lock is held.
+ * keep a list on each CPU, with each list protected by its own spinlock.
+ * Global serialization is done using file_rwsem.
+ *
+ * Note that alterations to the list also require that the relevant flc_lock is
+ * held.
*/
-DEFINE_STATIC_LGLOCK(file_lock_lglock);
-static DEFINE_PER_CPU(struct hlist_head, file_lock_list);
+struct file_lock_list_struct {
+ spinlock_t lock;
+ struct hlist_head hlist;
+};
+static DEFINE_PER_CPU(struct file_lock_list_struct, file_lock_list);
+DEFINE_STATIC_PERCPU_RWSEM(file_rwsem);

/*
* The blocked_hash is used to find POSIX lock loops for deadlock detection.
@@ -587,15 +592,23 @@ static int posix_same_owner(struct file_lock *fl1, struct file_lock *fl2)
/* Must be called with the flc_lock held! */
static void locks_insert_global_locks(struct file_lock *fl)
{
- lg_local_lock(&file_lock_lglock);
+ struct file_lock_list_struct *fll = this_cpu_ptr(&file_lock_list);
+
+ percpu_rwsem_assert_held(&file_rwsem);
+
+ spin_lock(&fll->lock);
fl->fl_link_cpu = smp_processor_id();
- hlist_add_head(&fl->fl_link, this_cpu_ptr(&file_lock_list));
- lg_local_unlock(&file_lock_lglock);
+ hlist_add_head(&fl->fl_link, &fll->hlist);
+ spin_unlock(&fll->lock);
}

/* Must be called with the flc_lock held! */
static void locks_delete_global_locks(struct file_lock *fl)
{
+ struct file_lock_list_struct *fll;
+
+ percpu_rwsem_assert_held(&file_rwsem);
+
/*
* Avoid taking lock if already unhashed. This is safe since this check
* is done while holding the flc_lock, and new insertions into the list
@@ -603,9 +616,11 @@ static void locks_delete_global_locks(struct file_lock *fl)
*/
if (hlist_unhashed(&fl->fl_link))
return;
- lg_local_lock_cpu(&file_lock_lglock, fl->fl_link_cpu);
+
+ fll = per_cpu_ptr(&file_lock_list, fl->fl_link_cpu);
+ spin_lock(&fll->lock);
hlist_del_init(&fl->fl_link);
- lg_local_unlock_cpu(&file_lock_lglock, fl->fl_link_cpu);
+ spin_unlock(&fll->lock);
}

static unsigned long
@@ -915,6 +930,7 @@ static int flock_lock_inode(struct inode *inode, struct file_lock *request)
return -ENOMEM;
}

+ percpu_down_read_preempt_disable(&file_rwsem);
spin_lock(&ctx->flc_lock);
if (request->fl_flags & FL_ACCESS)
goto find_conflict;
@@ -955,6 +971,7 @@ static int flock_lock_inode(struct inode *inode, struct file_lock *request)

out:
spin_unlock(&ctx->flc_lock);
+ percpu_up_read_preempt_enable(&file_rwsem);
if (new_fl)
locks_free_lock(new_fl);
locks_dispose_list(&dispose);
@@ -991,6 +1008,7 @@ static int posix_lock_inode(struct inode *inode, struct file_lock *request,
new_fl2 = locks_alloc_lock();
}

+ percpu_down_read_preempt_disable(&file_rwsem);
spin_lock(&ctx->flc_lock);
/*
* New lock request. Walk all POSIX locks and look for conflicts. If
@@ -1162,6 +1180,7 @@ static int posix_lock_inode(struct inode *inode, struct file_lock *request,
}
out:
spin_unlock(&ctx->flc_lock);
+ percpu_up_read_preempt_enable(&file_rwsem);
/*
* Free any unused locks.
*/
@@ -1436,6 +1455,7 @@ int __break_lease(struct inode *inode, unsigned int mode, unsigned int type)
return error;
}

+ percpu_down_read_preempt_disable(&file_rwsem);
spin_lock(&ctx->flc_lock);

time_out_leases(inode, &dispose);
@@ -1487,9 +1507,13 @@ int __break_lease(struct inode *inode, unsigned int mode, unsigned int type)
locks_insert_block(fl, new_fl);
trace_break_lease_block(inode, new_fl);
spin_unlock(&ctx->flc_lock);
+ percpu_up_read_preempt_enable(&file_rwsem);
+
locks_dispose_list(&dispose);
error = wait_event_interruptible_timeout(new_fl->fl_wait,
!new_fl->fl_next, break_time);
+
+ percpu_down_read_preempt_disable(&file_rwsem);
spin_lock(&ctx->flc_lock);
trace_break_lease_unblock(inode, new_fl);
locks_delete_block(new_fl);
@@ -1506,6 +1530,7 @@ int __break_lease(struct inode *inode, unsigned int mode, unsigned int type)
}
out:
spin_unlock(&ctx->flc_lock);
+ percpu_up_read_preempt_enable(&file_rwsem);
locks_dispose_list(&dispose);
locks_free_lock(new_fl);
return error;
@@ -1660,6 +1685,7 @@ generic_add_lease(struct file *filp, long arg, struct file_lock **flp, void **pr
return -EINVAL;
}

+ percpu_down_read_preempt_disable(&file_rwsem);
spin_lock(&ctx->flc_lock);
time_out_leases(inode, &dispose);
error = check_conflicting_open(dentry, arg, lease->fl_flags);
@@ -1730,6 +1756,7 @@ generic_add_lease(struct file *filp, long arg, struct file_lock **flp, void **pr
lease->fl_lmops->lm_setup(lease, priv);
out:
spin_unlock(&ctx->flc_lock);
+ percpu_up_read_preempt_enable(&file_rwsem);
locks_dispose_list(&dispose);
if (is_deleg)
inode_unlock(inode);
@@ -1752,6 +1779,7 @@ static int generic_delete_lease(struct file *filp, void *owner)
return error;
}

+ percpu_down_read_preempt_disable(&file_rwsem);
spin_lock(&ctx->flc_lock);
list_for_each_entry(fl, &ctx->flc_lease, fl_list) {
if (fl->fl_file == filp &&
@@ -1764,6 +1792,7 @@ static int generic_delete_lease(struct file *filp, void *owner)
if (victim)
error = fl->fl_lmops->lm_change(victim, F_UNLCK, &dispose);
spin_unlock(&ctx->flc_lock);
+ percpu_up_read_preempt_enable(&file_rwsem);
locks_dispose_list(&dispose);
return error;
}
@@ -2703,9 +2732,9 @@ static void *locks_start(struct seq_file *f, loff_t *pos)
struct locks_iterator *iter = f->private;

iter->li_pos = *pos + 1;
- lg_global_lock(&file_lock_lglock);
+ percpu_down_write(&file_rwsem);
spin_lock(&blocked_lock_lock);
- return seq_hlist_start_percpu(&file_lock_list, &iter->li_cpu, *pos);
+ return seq_hlist_start_percpu(&file_lock_list.hlist, &iter->li_cpu, *pos);
}

static void *locks_next(struct seq_file *f, void *v, loff_t *pos)
@@ -2713,14 +2742,14 @@ static void *locks_next(struct seq_file *f, void *v, loff_t *pos)
struct locks_iterator *iter = f->private;

++iter->li_pos;
- return seq_hlist_next_percpu(v, &file_lock_list, &iter->li_cpu, pos);
+ return seq_hlist_next_percpu(v, &file_lock_list.hlist, &iter->li_cpu, pos);
}

static void locks_stop(struct seq_file *f, void *v)
__releases(&blocked_lock_lock)
{
spin_unlock(&blocked_lock_lock);
- lg_global_unlock(&file_lock_lglock);
+ percpu_up_write(&file_rwsem);
}

static const struct seq_operations locks_seq_operations = {
@@ -2761,10 +2790,13 @@ static int __init filelock_init(void)
filelock_cache = kmem_cache_create("file_lock_cache",
sizeof(struct file_lock), 0, SLAB_PANIC, NULL);

- lg_lock_init(&file_lock_lglock, "file_lock_lglock");

- for_each_possible_cpu(i)
- INIT_HLIST_HEAD(per_cpu_ptr(&file_lock_list, i));
+ for_each_possible_cpu(i) {
+ struct file_lock_list_struct *fll = per_cpu_ptr(&file_lock_list, i);
+
+ spin_lock_init(&fll->lock);
+ INIT_HLIST_HEAD(&fll->hlist);
+ }

return 0;
}
diff --git a/include/linux/lglock.h b/include/linux/lglock.h
deleted file mode 100644
index c92ebd100d9b..000000000000
--- a/include/linux/lglock.h
+++ /dev/null
@@ -1,81 +0,0 @@
-/*
- * Specialised local-global spinlock. Can only be declared as global variables
- * to avoid overhead and keep things simple (and we don't want to start using
- * these inside dynamically allocated structures).
- *
- * "local/global locks" (lglocks) can be used to:
- *
- * - Provide fast exclusive access to per-CPU data, with exclusive access to
- * another CPU's data allowed but possibly subject to contention, and to
- * provide very slow exclusive access to all per-CPU data.
- * - Or to provide very fast and scalable read serialisation, and to provide
- * very slow exclusive serialisation of data (not necessarily per-CPU data).
- *
- * Brlocks are also implemented as a short-hand notation for the latter use
- * case.
- *
- * Copyright 2009, 2010, Nick Piggin, Novell Inc.
- */
-#ifndef __LINUX_LGLOCK_H
-#define __LINUX_LGLOCK_H
-
-#include <linux/spinlock.h>
-#include <linux/lockdep.h>
-#include <linux/percpu.h>
-#include <linux/cpu.h>
-#include <linux/notifier.h>
-
-#ifdef CONFIG_SMP
-
-#ifdef CONFIG_DEBUG_LOCK_ALLOC
-#define LOCKDEP_INIT_MAP lockdep_init_map
-#else
-#define LOCKDEP_INIT_MAP(a, b, c, d)
-#endif
-
-struct lglock {
- arch_spinlock_t __percpu *lock;
-#ifdef CONFIG_DEBUG_LOCK_ALLOC
- struct lock_class_key lock_key;
- struct lockdep_map lock_dep_map;
-#endif
-};
-
-#define DEFINE_LGLOCK(name) \
- static DEFINE_PER_CPU(arch_spinlock_t, name ## _lock) \
- = __ARCH_SPIN_LOCK_UNLOCKED; \
- struct lglock name = { .lock = &name ## _lock }
-
-#define DEFINE_STATIC_LGLOCK(name) \
- static DEFINE_PER_CPU(arch_spinlock_t, name ## _lock) \
- = __ARCH_SPIN_LOCK_UNLOCKED; \
- static struct lglock name = { .lock = &name ## _lock }
-
-void lg_lock_init(struct lglock *lg, char *name);
-
-void lg_local_lock(struct lglock *lg);
-void lg_local_unlock(struct lglock *lg);
-void lg_local_lock_cpu(struct lglock *lg, int cpu);
-void lg_local_unlock_cpu(struct lglock *lg, int cpu);
-
-void lg_double_lock(struct lglock *lg, int cpu1, int cpu2);
-void lg_double_unlock(struct lglock *lg, int cpu1, int cpu2);
-
-void lg_global_lock(struct lglock *lg);
-void lg_global_unlock(struct lglock *lg);
-
-#else
-/* When !CONFIG_SMP, map lglock to spinlock */
-#define lglock spinlock
-#define DEFINE_LGLOCK(name) DEFINE_SPINLOCK(name)
-#define DEFINE_STATIC_LGLOCK(name) static DEFINE_SPINLOCK(name)
-#define lg_lock_init(lg, name) spin_lock_init(lg)
-#define lg_local_lock spin_lock
-#define lg_local_unlock spin_unlock
-#define lg_local_lock_cpu(lg, cpu) spin_lock(lg)
-#define lg_local_unlock_cpu(lg, cpu) spin_unlock(lg)
-#define lg_global_lock spin_lock
-#define lg_global_unlock spin_unlock
-#endif
-
-#endif
diff --git a/include/linux/percpu-rwsem.h b/include/linux/percpu-rwsem.h
index c2fa3ecb0dce..5b2e6159b744 100644
--- a/include/linux/percpu-rwsem.h
+++ b/include/linux/percpu-rwsem.h
@@ -10,32 +10,122 @@

struct percpu_rw_semaphore {
struct rcu_sync rss;
- unsigned int __percpu *fast_read_ctr;
+ unsigned int __percpu *read_count;
struct rw_semaphore rw_sem;
- atomic_t slow_read_ctr;
- wait_queue_head_t write_waitq;
+ wait_queue_head_t writer;
+ int readers_block;
};

-extern void percpu_down_read(struct percpu_rw_semaphore *);
-extern int percpu_down_read_trylock(struct percpu_rw_semaphore *);
-extern void percpu_up_read(struct percpu_rw_semaphore *);
+#define DEFINE_STATIC_PERCPU_RWSEM(name) \
+static DEFINE_PER_CPU(unsigned int, __percpu_rwsem_rc_##name); \
+static struct percpu_rw_semaphore name = { \
+ .rss = __RCU_SYNC_INITIALIZER(name.rss, RCU_SCHED_SYNC), \
+ .read_count = &__percpu_rwsem_rc_##name, \
+ .rw_sem = __RWSEM_INITIALIZER(name.rw_sem), \
+ .writer = __WAIT_QUEUE_HEAD_INITIALIZER(name.writer), \
+}
+
+extern int __percpu_down_read(struct percpu_rw_semaphore *, int);
+extern void __percpu_up_read(struct percpu_rw_semaphore *);
+
+static inline void percpu_down_read_preempt_disable(struct percpu_rw_semaphore *sem)
+{
+ might_sleep();
+
+ rwsem_acquire_read(&sem->rw_sem.dep_map, 0, 0, _RET_IP_);
+
+ preempt_disable();
+ /*
+ * We are in an RCU-sched read-side critical section, so the writer
+ * cannot both change sem->state from readers_fast and start checking
+ * counters while we are here. So if we see !sem->state, we know that
+ * the writer won't be checking until we're past the preempt_enable()
+ * and that one the synchronize_sched() is done, the writer will see
+ * anything we did within this RCU-sched read-size critical section.
+ */
+ __this_cpu_inc(*sem->read_count);
+ if (unlikely(!rcu_sync_is_idle(&sem->rss)))
+ __percpu_down_read(sem, false); /* Unconditional memory barrier */
+ barrier();
+ /*
+ * The barrier() prevents the compiler from
+ * bleeding the critical section out.
+ */
+}
+
+static inline void percpu_down_read(struct percpu_rw_semaphore *sem)
+{
+ percpu_down_read_preempt_disable(sem);
+ preempt_enable();
+}
+
+static inline int percpu_down_read_trylock(struct percpu_rw_semaphore *sem)
+{
+ int ret = 1;
+
+ preempt_disable();
+ /*
+ * Same as in percpu_down_read().
+ */
+ __this_cpu_inc(*sem->read_count);
+ if (unlikely(!rcu_sync_is_idle(&sem->rss)))
+ ret = __percpu_down_read(sem, true); /* Unconditional memory barrier */
+ preempt_enable();
+ /*
+ * The barrier() from preempt_enable() prevents the compiler from
+ * bleeding the critical section out.
+ */
+
+ if (ret)
+ rwsem_acquire_read(&sem->rw_sem.dep_map, 0, 1, _RET_IP_);
+
+ return ret;
+}
+
+static inline void percpu_up_read_preempt_enable(struct percpu_rw_semaphore *sem)
+{
+ /*
+ * The barrier() prevents the compiler from
+ * bleeding the critical section out.
+ */
+ barrier();
+ /*
+ * Same as in percpu_down_read().
+ */
+ if (likely(rcu_sync_is_idle(&sem->rss)))
+ __this_cpu_dec(*sem->read_count);
+ else
+ __percpu_up_read(sem); /* Unconditional memory barrier */
+ preempt_enable();
+
+ rwsem_release(&sem->rw_sem.dep_map, 1, _RET_IP_);
+}
+
+static inline void percpu_up_read(struct percpu_rw_semaphore *sem)
+{
+ preempt_disable();
+ percpu_up_read_preempt_enable(sem);
+}

extern void percpu_down_write(struct percpu_rw_semaphore *);
extern void percpu_up_write(struct percpu_rw_semaphore *);

extern int __percpu_init_rwsem(struct percpu_rw_semaphore *,
const char *, struct lock_class_key *);
+
extern void percpu_free_rwsem(struct percpu_rw_semaphore *);

-#define percpu_init_rwsem(brw) \
+#define percpu_init_rwsem(sem) \
({ \
static struct lock_class_key rwsem_key; \
- __percpu_init_rwsem(brw, #brw, &rwsem_key); \
+ __percpu_init_rwsem(sem, #sem, &rwsem_key); \
})

-
#define percpu_rwsem_is_held(sem) lockdep_is_held(&(sem)->rw_sem)

+#define percpu_rwsem_assert_held(sem) \
+ lockdep_assert_held(&(sem)->rw_sem)
+
static inline void percpu_rwsem_release(struct percpu_rw_semaphore *sem,
bool read, unsigned long ip)
{
diff --git a/include/linux/rcu_sync.h b/include/linux/rcu_sync.h
index a63a33e6196e..ece7ed9a4a70 100644
--- a/include/linux/rcu_sync.h
+++ b/include/linux/rcu_sync.h
@@ -59,6 +59,7 @@ static inline bool rcu_sync_is_idle(struct rcu_sync *rsp)
}

extern void rcu_sync_init(struct rcu_sync *, enum rcu_sync_type);
+extern void rcu_sync_enter_start(struct rcu_sync *);
extern void rcu_sync_enter(struct rcu_sync *);
extern void rcu_sync_exit(struct rcu_sync *);
extern void rcu_sync_dtor(struct rcu_sync *);
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index d6b729beba49..9ba28310eab6 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -5627,6 +5627,12 @@ int __init cgroup_init(void)
BUG_ON(cgroup_init_cftypes(NULL, cgroup_dfl_base_files));
BUG_ON(cgroup_init_cftypes(NULL, cgroup_legacy_base_files));

+ /*
+ * The latency of the synchronize_sched() is too high for cgroups,
+ * avoid it at the cost of forcing all readers into the slow path.
+ */
+ rcu_sync_enter_start(&cgroup_threadgroup_rwsem.rss);
+
get_user_ns(init_cgroup_ns.user_ns);

mutex_lock(&cgroup_mutex);
diff --git a/kernel/futex.c b/kernel/futex.c
index 46cb3a301bc1..2c4be467fecd 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -381,8 +381,12 @@ static inline int hb_waiters_pending(struct futex_hash_bucket *hb)
#endif
}

-/*
- * We hash on the keys returned from get_futex_key (see below).
+/**
+ * hash_futex - Return the hash bucket in the global hash
+ * @key: Pointer to the futex key for which the hash is calculated
+ *
+ * We hash on the keys returned from get_futex_key (see below) and return the
+ * corresponding hash bucket in the global hash.
*/
static struct futex_hash_bucket *hash_futex(union futex_key *key)
{
@@ -392,7 +396,12 @@ static struct futex_hash_bucket *hash_futex(union futex_key *key)
return &futex_queues[hash & (futex_hashsize - 1)];
}

-/*
+
+/**
+ * match_futex - Check whether two futex keys are equal
+ * @key1: Pointer to key1
+ * @key2: Pointer to key2
+ *
* Return 1 if two futex_keys are equal, 0 otherwise.
*/
static inline int match_futex(union futex_key *key1, union futex_key *key2)
diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index d234022805dc..432c3d71d195 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -117,7 +117,7 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout)
pr_err("\"echo 0 > /proc/sys/kernel/hung_task_timeout_secs\""
" disables this message.\n");
sched_show_task(t);
- debug_show_held_locks(t);
+ debug_show_all_locks();

touch_nmi_watchdog();

diff --git a/kernel/locking/Makefile b/kernel/locking/Makefile
index 31322a4275cd..6f88e352cd4f 100644
--- a/kernel/locking/Makefile
+++ b/kernel/locking/Makefile
@@ -18,7 +18,6 @@ obj-$(CONFIG_LOCKDEP) += lockdep_proc.o
endif
obj-$(CONFIG_SMP) += spinlock.o
obj-$(CONFIG_LOCK_SPIN_ON_OWNER) += osq_lock.o
-obj-$(CONFIG_SMP) += lglock.o
obj-$(CONFIG_PROVE_LOCKING) += spinlock.o
obj-$(CONFIG_QUEUED_SPINLOCKS) += qspinlock.o
obj-$(CONFIG_RT_MUTEXES) += rtmutex.o
diff --git a/kernel/locking/lglock.c b/kernel/locking/lglock.c
deleted file mode 100644
index 951cfcd10b4a..000000000000
--- a/kernel/locking/lglock.c
+++ /dev/null
@@ -1,111 +0,0 @@
-/* See include/linux/lglock.h for description */
-#include <linux/module.h>
-#include <linux/lglock.h>
-#include <linux/cpu.h>
-#include <linux/string.h>
-
-/*
- * Note there is no uninit, so lglocks cannot be defined in
- * modules (but it's fine to use them from there)
- * Could be added though, just undo lg_lock_init
- */
-
-void lg_lock_init(struct lglock *lg, char *name)
-{
- LOCKDEP_INIT_MAP(&lg->lock_dep_map, name, &lg->lock_key, 0);
-}
-EXPORT_SYMBOL(lg_lock_init);
-
-void lg_local_lock(struct lglock *lg)
-{
- arch_spinlock_t *lock;
-
- preempt_disable();
- lock_acquire_shared(&lg->lock_dep_map, 0, 0, NULL, _RET_IP_);
- lock = this_cpu_ptr(lg->lock);
- arch_spin_lock(lock);
-}
-EXPORT_SYMBOL(lg_local_lock);
-
-void lg_local_unlock(struct lglock *lg)
-{
- arch_spinlock_t *lock;
-
- lock_release(&lg->lock_dep_map, 1, _RET_IP_);
- lock = this_cpu_ptr(lg->lock);
- arch_spin_unlock(lock);
- preempt_enable();
-}
-EXPORT_SYMBOL(lg_local_unlock);
-
-void lg_local_lock_cpu(struct lglock *lg, int cpu)
-{
- arch_spinlock_t *lock;
-
- preempt_disable();
- lock_acquire_shared(&lg->lock_dep_map, 0, 0, NULL, _RET_IP_);
- lock = per_cpu_ptr(lg->lock, cpu);
- arch_spin_lock(lock);
-}
-EXPORT_SYMBOL(lg_local_lock_cpu);
-
-void lg_local_unlock_cpu(struct lglock *lg, int cpu)
-{
- arch_spinlock_t *lock;
-
- lock_release(&lg->lock_dep_map, 1, _RET_IP_);
- lock = per_cpu_ptr(lg->lock, cpu);
- arch_spin_unlock(lock);
- preempt_enable();
-}
-EXPORT_SYMBOL(lg_local_unlock_cpu);
-
-void lg_double_lock(struct lglock *lg, int cpu1, int cpu2)
-{
- BUG_ON(cpu1 == cpu2);
-
- /* lock in cpu order, just like lg_global_lock */
- if (cpu2 < cpu1)
- swap(cpu1, cpu2);
-
- preempt_disable();
- lock_acquire_shared(&lg->lock_dep_map, 0, 0, NULL, _RET_IP_);
- arch_spin_lock(per_cpu_ptr(lg->lock, cpu1));
- arch_spin_lock(per_cpu_ptr(lg->lock, cpu2));
-}
-
-void lg_double_unlock(struct lglock *lg, int cpu1, int cpu2)
-{
- lock_release(&lg->lock_dep_map, 1, _RET_IP_);
- arch_spin_unlock(per_cpu_ptr(lg->lock, cpu1));
- arch_spin_unlock(per_cpu_ptr(lg->lock, cpu2));
- preempt_enable();
-}
-
-void lg_global_lock(struct lglock *lg)
-{
- int i;
-
- preempt_disable();
- lock_acquire_exclusive(&lg->lock_dep_map, 0, 0, NULL, _RET_IP_);
- for_each_possible_cpu(i) {
- arch_spinlock_t *lock;
- lock = per_cpu_ptr(lg->lock, i);
- arch_spin_lock(lock);
- }
-}
-EXPORT_SYMBOL(lg_global_lock);
-
-void lg_global_unlock(struct lglock *lg)
-{
- int i;
-
- lock_release(&lg->lock_dep_map, 1, _RET_IP_);
- for_each_possible_cpu(i) {
- arch_spinlock_t *lock;
- lock = per_cpu_ptr(lg->lock, i);
- arch_spin_unlock(lock);
- }
- preempt_enable();
-}
-EXPORT_SYMBOL(lg_global_unlock);
diff --git a/kernel/locking/percpu-rwsem.c b/kernel/locking/percpu-rwsem.c
index bec0b647f9cc..ce182599cf2e 100644
--- a/kernel/locking/percpu-rwsem.c
+++ b/kernel/locking/percpu-rwsem.c
@@ -8,152 +8,186 @@
#include <linux/sched.h>
#include <linux/errno.h>

-int __percpu_init_rwsem(struct percpu_rw_semaphore *brw,
+int __percpu_init_rwsem(struct percpu_rw_semaphore *sem,
const char *name, struct lock_class_key *rwsem_key)
{
- brw->fast_read_ctr = alloc_percpu(int);
- if (unlikely(!brw->fast_read_ctr))
+ sem->read_count = alloc_percpu(int);
+ if (unlikely(!sem->read_count))
return -ENOMEM;

/* ->rw_sem represents the whole percpu_rw_semaphore for lockdep */
- __init_rwsem(&brw->rw_sem, name, rwsem_key);
- rcu_sync_init(&brw->rss, RCU_SCHED_SYNC);
- atomic_set(&brw->slow_read_ctr, 0);
- init_waitqueue_head(&brw->write_waitq);
+ rcu_sync_init(&sem->rss, RCU_SCHED_SYNC);
+ __init_rwsem(&sem->rw_sem, name, rwsem_key);
+ init_waitqueue_head(&sem->writer);
+ sem->readers_block = 0;
return 0;
}
EXPORT_SYMBOL_GPL(__percpu_init_rwsem);

-void percpu_free_rwsem(struct percpu_rw_semaphore *brw)
+void percpu_free_rwsem(struct percpu_rw_semaphore *sem)
{
/*
* XXX: temporary kludge. The error path in alloc_super()
* assumes that percpu_free_rwsem() is safe after kzalloc().
*/
- if (!brw->fast_read_ctr)
+ if (!sem->read_count)
return;

- rcu_sync_dtor(&brw->rss);
- free_percpu(brw->fast_read_ctr);
- brw->fast_read_ctr = NULL; /* catch use after free bugs */
+ rcu_sync_dtor(&sem->rss);
+ free_percpu(sem->read_count);
+ sem->read_count = NULL; /* catch use after free bugs */
}
EXPORT_SYMBOL_GPL(percpu_free_rwsem);

-/*
- * This is the fast-path for down_read/up_read. If it succeeds we rely
- * on the barriers provided by rcu_sync_enter/exit; see the comments in
- * percpu_down_write() and percpu_up_write().
- *
- * If this helper fails the callers rely on the normal rw_semaphore and
- * atomic_dec_and_test(), so in this case we have the necessary barriers.
- */
-static bool update_fast_ctr(struct percpu_rw_semaphore *brw, unsigned int val)
+int __percpu_down_read(struct percpu_rw_semaphore *sem, int try)
{
- bool success;
+ /*
+ * Due to having preemption disabled the decrement happens on
+ * the same CPU as the increment, avoiding the
+ * increment-on-one-CPU-and-decrement-on-another problem.
+ *
+ * If the reader misses the writer's assignment of readers_block, then
+ * the writer is guaranteed to see the reader's increment.
+ *
+ * Conversely, any readers that increment their sem->read_count after
+ * the writer looks are guaranteed to see the readers_block value,
+ * which in turn means that they are guaranteed to immediately
+ * decrement their sem->read_count, so that it doesn't matter that the
+ * writer missed them.
+ */

- preempt_disable();
- success = rcu_sync_is_idle(&brw->rss);
- if (likely(success))
- __this_cpu_add(*brw->fast_read_ctr, val);
- preempt_enable();
+ smp_mb(); /* A matches D */

- return success;
-}
+ /*
+ * If !readers_block the critical section starts here, matched by the
+ * release in percpu_up_write().
+ */
+ if (likely(!smp_load_acquire(&sem->readers_block)))
+ return 1;

-/*
- * Like the normal down_read() this is not recursive, the writer can
- * come after the first percpu_down_read() and create the deadlock.
- *
- * Note: returns with lock_is_held(brw->rw_sem) == T for lockdep,
- * percpu_up_read() does rwsem_release(). This pairs with the usage
- * of ->rw_sem in percpu_down/up_write().
- */
-void percpu_down_read(struct percpu_rw_semaphore *brw)
-{
- might_sleep();
- rwsem_acquire_read(&brw->rw_sem.dep_map, 0, 0, _RET_IP_);
+ /*
+ * Per the above comment; we still have preemption disabled and
+ * will thus decrement on the same CPU as we incremented.
+ */
+ __percpu_up_read(sem);

- if (likely(update_fast_ctr(brw, +1)))
- return;
+ if (try)
+ return 0;

- /* Avoid rwsem_acquire_read() and rwsem_release() */
- __down_read(&brw->rw_sem);
- atomic_inc(&brw->slow_read_ctr);
- __up_read(&brw->rw_sem);
-}
-EXPORT_SYMBOL_GPL(percpu_down_read);
+ /*
+ * We either call schedule() in the wait, or we'll fall through
+ * and reschedule on the preempt_enable() in percpu_down_read().
+ */
+ preempt_enable_no_resched();

-int percpu_down_read_trylock(struct percpu_rw_semaphore *brw)
-{
- if (unlikely(!update_fast_ctr(brw, +1))) {
- if (!__down_read_trylock(&brw->rw_sem))
- return 0;
- atomic_inc(&brw->slow_read_ctr);
- __up_read(&brw->rw_sem);
- }
-
- rwsem_acquire_read(&brw->rw_sem.dep_map, 0, 1, _RET_IP_);
+ /*
+ * Avoid lockdep for the down/up_read() we already have them.
+ */
+ __down_read(&sem->rw_sem);
+ this_cpu_inc(*sem->read_count);
+ __up_read(&sem->rw_sem);
+
+ preempt_disable();
return 1;
}
+EXPORT_SYMBOL_GPL(__percpu_down_read);

-void percpu_up_read(struct percpu_rw_semaphore *brw)
+void __percpu_up_read(struct percpu_rw_semaphore *sem)
{
- rwsem_release(&brw->rw_sem.dep_map, 1, _RET_IP_);
-
- if (likely(update_fast_ctr(brw, -1)))
- return;
+ smp_mb(); /* B matches C */
+ /*
+ * In other words, if they see our decrement (presumably to aggregate
+ * zero, as that is the only time it matters) they will also see our
+ * critical section.
+ */
+ __this_cpu_dec(*sem->read_count);

- /* false-positive is possible but harmless */
- if (atomic_dec_and_test(&brw->slow_read_ctr))
- wake_up_all(&brw->write_waitq);
+ /* Prod writer to recheck readers_active */
+ wake_up(&sem->writer);
}
-EXPORT_SYMBOL_GPL(percpu_up_read);
+EXPORT_SYMBOL_GPL(__percpu_up_read);
+
+#define per_cpu_sum(var) \
+({ \
+ typeof(var) __sum = 0; \
+ int cpu; \
+ compiletime_assert_atomic_type(__sum); \
+ for_each_possible_cpu(cpu) \
+ __sum += per_cpu(var, cpu); \
+ __sum; \
+})

-static int clear_fast_ctr(struct percpu_rw_semaphore *brw)
+/*
+ * Return true if the modular sum of the sem->read_count per-CPU variable is
+ * zero. If this sum is zero, then it is stable due to the fact that if any
+ * newly arriving readers increment a given counter, they will immediately
+ * decrement that same counter.
+ */
+static bool readers_active_check(struct percpu_rw_semaphore *sem)
{
- unsigned int sum = 0;
- int cpu;
+ if (per_cpu_sum(*sem->read_count) != 0)
+ return false;
+
+ /*
+ * If we observed the decrement; ensure we see the entire critical
+ * section.
+ */

- for_each_possible_cpu(cpu) {
- sum += per_cpu(*brw->fast_read_ctr, cpu);
- per_cpu(*brw->fast_read_ctr, cpu) = 0;
- }
+ smp_mb(); /* C matches B */

- return sum;
+ return true;
}

-void percpu_down_write(struct percpu_rw_semaphore *brw)
+void percpu_down_write(struct percpu_rw_semaphore *sem)
{
+ /* Notify readers to take the slow path. */
+ rcu_sync_enter(&sem->rss);
+
+ down_write(&sem->rw_sem);
+
/*
- * Make rcu_sync_is_idle() == F and thus disable the fast-path in
- * percpu_down_read() and percpu_up_read(), and wait for gp pass.
- *
- * The latter synchronises us with the preceding readers which used
- * the fast-past, so we can not miss the result of __this_cpu_add()
- * or anything else inside their criticial sections.
+ * Notify new readers to block; up until now, and thus throughout the
+ * longish rcu_sync_enter() above, new readers could still come in.
*/
- rcu_sync_enter(&brw->rss);
+ WRITE_ONCE(sem->readers_block, 1);

- /* exclude other writers, and block the new readers completely */
- down_write(&brw->rw_sem);
+ smp_mb(); /* D matches A */

- /* nobody can use fast_read_ctr, move its sum into slow_read_ctr */
- atomic_add(clear_fast_ctr(brw), &brw->slow_read_ctr);
+ /*
+ * If they don't see our writer of readers_block, then we are
+ * guaranteed to see their sem->read_count increment, and therefore
+ * will wait for them.
+ */

- /* wait for all readers to complete their percpu_up_read() */
- wait_event(brw->write_waitq, !atomic_read(&brw->slow_read_ctr));
+ /* Wait for all now active readers to complete. */
+ wait_event(sem->writer, readers_active_check(sem));
}
EXPORT_SYMBOL_GPL(percpu_down_write);

-void percpu_up_write(struct percpu_rw_semaphore *brw)
+void percpu_up_write(struct percpu_rw_semaphore *sem)
{
- /* release the lock, but the readers can't use the fast-path */
- up_write(&brw->rw_sem);
/*
- * Enable the fast-path in percpu_down_read() and percpu_up_read()
- * but only after another gp pass; this adds the necessary barrier
- * to ensure the reader can't miss the changes done by us.
+ * Signal the writer is done, no fast path yet.
+ *
+ * One reason that we cannot just immediately flip to readers_fast is
+ * that new readers might fail to see the results of this writer's
+ * critical section.
+ *
+ * Therefore we force it through the slow path which guarantees an
+ * acquire and thereby guarantees the critical section's consistency.
+ */
+ smp_store_release(&sem->readers_block, 0);
+
+ /*
+ * Release the write lock, this will allow readers back in the game.
+ */
+ up_write(&sem->rw_sem);
+
+ /*
+ * Once this completes (at least one RCU-sched grace period hence) the
+ * reader fast path will be available again. Safe to use outside the
+ * exclusive write lock because its counting.
*/
- rcu_sync_exit(&brw->rss);
+ rcu_sync_exit(&sem->rss);
}
EXPORT_SYMBOL_GPL(percpu_up_write);
diff --git a/kernel/locking/qspinlock_paravirt.h b/kernel/locking/qspinlock_paravirt.h
index 8a99abf58080..e3b5520005db 100644
--- a/kernel/locking/qspinlock_paravirt.h
+++ b/kernel/locking/qspinlock_paravirt.h
@@ -70,11 +70,14 @@ struct pv_node {
static inline bool pv_queued_spin_steal_lock(struct qspinlock *lock)
{
struct __qspinlock *l = (void *)lock;
- int ret = !(atomic_read(&lock->val) & _Q_LOCKED_PENDING_MASK) &&
- (cmpxchg(&l->locked, 0, _Q_LOCKED_VAL) == 0);

- qstat_inc(qstat_pv_lock_stealing, ret);
- return ret;
+ if (!(atomic_read(&lock->val) & _Q_LOCKED_PENDING_MASK) &&
+ (cmpxchg(&l->locked, 0, _Q_LOCKED_VAL) == 0)) {
+ qstat_inc(qstat_pv_lock_stealing, true);
+ return true;
+ }
+
+ return false;
}

/*
@@ -257,7 +260,6 @@ static struct pv_node *pv_unhash(struct qspinlock *lock)
static inline bool
pv_wait_early(struct pv_node *prev, int loop)
{
-
if ((loop & PV_PREV_CHECK_MASK) != 0)
return false;

@@ -286,12 +288,10 @@ static void pv_wait_node(struct mcs_spinlock *node, struct mcs_spinlock *prev)
{
struct pv_node *pn = (struct pv_node *)node;
struct pv_node *pp = (struct pv_node *)prev;
- int waitcnt = 0;
int loop;
bool wait_early;

- /* waitcnt processing will be compiled out if !QUEUED_LOCK_STAT */
- for (;; waitcnt++) {
+ for (;;) {
for (wait_early = false, loop = SPIN_THRESHOLD; loop; loop--) {
if (READ_ONCE(node->locked))
return;
@@ -315,7 +315,6 @@ static void pv_wait_node(struct mcs_spinlock *node, struct mcs_spinlock *prev)

if (!READ_ONCE(node->locked)) {
qstat_inc(qstat_pv_wait_node, true);
- qstat_inc(qstat_pv_wait_again, waitcnt);
qstat_inc(qstat_pv_wait_early, wait_early);
pv_wait(&pn->state, vcpu_halted);
}
@@ -456,12 +455,9 @@ pv_wait_head_or_lock(struct qspinlock *lock, struct mcs_spinlock *node)
pv_wait(&l->locked, _Q_SLOW_VAL);

/*
- * The unlocker should have freed the lock before kicking the
- * CPU. So if the lock is still not free, it is a spurious
- * wakeup or another vCPU has stolen the lock. The current
- * vCPU should spin again.
+ * Because of lock stealing, the queue head vCPU may not be
+ * able to acquire the lock before it has to wait again.
*/
- qstat_inc(qstat_pv_spurious_wakeup, READ_ONCE(l->locked));
}

/*
@@ -544,7 +540,7 @@ __visible void __pv_queued_spin_unlock(struct qspinlock *lock)
* unhash. Otherwise it would be possible to have multiple @lock
* entries, which would be BAD.
*/
- locked = cmpxchg(&l->locked, _Q_LOCKED_VAL, 0);
+ locked = cmpxchg_release(&l->locked, _Q_LOCKED_VAL, 0);
if (likely(locked == _Q_LOCKED_VAL))
return;

diff --git a/kernel/locking/qspinlock_stat.h b/kernel/locking/qspinlock_stat.h
index b9d031516254..eb0a599fcf58 100644
--- a/kernel/locking/qspinlock_stat.h
+++ b/kernel/locking/qspinlock_stat.h
@@ -24,8 +24,8 @@
* pv_latency_wake - average latency (ns) from vCPU kick to wakeup
* pv_lock_slowpath - # of locking operations via the slowpath
* pv_lock_stealing - # of lock stealing operations
- * pv_spurious_wakeup - # of spurious wakeups
- * pv_wait_again - # of vCPU wait's that happened after a vCPU kick
+ * pv_spurious_wakeup - # of spurious wakeups in non-head vCPUs
+ * pv_wait_again - # of wait's after a queue head vCPU kick
* pv_wait_early - # of early vCPU wait's
* pv_wait_head - # of vCPU wait's at the queue head
* pv_wait_node - # of vCPU wait's at a non-head queue node
diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 447e08de1fab..2337b4bb2366 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -121,16 +121,19 @@ enum rwsem_wake_type {
* - woken process blocks are discarded from the list after having task zeroed
* - writers are only marked woken if downgrading is false
*/
-static struct rw_semaphore *
-__rwsem_mark_wake(struct rw_semaphore *sem,
- enum rwsem_wake_type wake_type, struct wake_q_head *wake_q)
+static void __rwsem_mark_wake(struct rw_semaphore *sem,
+ enum rwsem_wake_type wake_type,
+ struct wake_q_head *wake_q)
{
- struct rwsem_waiter *waiter;
- struct task_struct *tsk;
- struct list_head *next;
- long oldcount, woken, loop, adjustment;
+ struct rwsem_waiter *waiter, *tmp;
+ long oldcount, woken = 0, adjustment = 0;
+
+ /*
+ * Take a peek at the queue head waiter such that we can determine
+ * the wakeup(s) to perform.
+ */
+ waiter = list_first_entry(&sem->wait_list, struct rwsem_waiter, list);

- waiter = list_entry(sem->wait_list.next, struct rwsem_waiter, list);
if (waiter->type == RWSEM_WAITING_FOR_WRITE) {
if (wake_type == RWSEM_WAKE_ANY) {
/*
@@ -142,19 +145,19 @@ __rwsem_mark_wake(struct rw_semaphore *sem,
*/
wake_q_add(wake_q, waiter->task);
}
- goto out;
+
+ return;
}

- /* Writers might steal the lock before we grant it to the next reader.
+ /*
+ * Writers might steal the lock before we grant it to the next reader.
* We prefer to do the first reader grant before counting readers
* so we can bail out early if a writer stole the lock.
*/
- adjustment = 0;
if (wake_type != RWSEM_WAKE_READ_OWNED) {
adjustment = RWSEM_ACTIVE_READ_BIAS;
try_reader_grant:
oldcount = atomic_long_fetch_add(adjustment, &sem->count);
-
if (unlikely(oldcount < RWSEM_WAITING_BIAS)) {
/*
* If the count is still less than RWSEM_WAITING_BIAS
@@ -164,7 +167,8 @@ __rwsem_mark_wake(struct rw_semaphore *sem,
*/
if (atomic_long_add_return(-adjustment, &sem->count) <
RWSEM_WAITING_BIAS)
- goto out;
+ return;
+
/* Last active locker left. Retry waking readers. */
goto try_reader_grant;
}
@@ -176,38 +180,23 @@ __rwsem_mark_wake(struct rw_semaphore *sem,
rwsem_set_reader_owned(sem);
}

- /* Grant an infinite number of read locks to the readers at the front
- * of the queue. Note we increment the 'active part' of the count by
- * the number of readers before waking any processes up.
+ /*
+ * Grant an infinite number of read locks to the readers at the front
+ * of the queue. We know that woken will be at least 1 as we accounted
+ * for above. Note we increment the 'active part' of the count by the
+ * number of readers before waking any processes up.
*/
- woken = 0;
- do {
- woken++;
+ list_for_each_entry_safe(waiter, tmp, &sem->wait_list, list) {
+ struct task_struct *tsk;

- if (waiter->list.next == &sem->wait_list)
+ if (waiter->type == RWSEM_WAITING_FOR_WRITE)
break;

- waiter = list_entry(waiter->list.next,
- struct rwsem_waiter, list);
-
- } while (waiter->type != RWSEM_WAITING_FOR_WRITE);
-
- adjustment = woken * RWSEM_ACTIVE_READ_BIAS - adjustment;
- if (waiter->type != RWSEM_WAITING_FOR_WRITE)
- /* hit end of list above */
- adjustment -= RWSEM_WAITING_BIAS;
-
- if (adjustment)
- atomic_long_add(adjustment, &sem->count);
-
- next = sem->wait_list.next;
- loop = woken;
- do {
- waiter = list_entry(next, struct rwsem_waiter, list);
- next = waiter->list.next;
+ woken++;
tsk = waiter->task;

wake_q_add(wake_q, tsk);
+ list_del(&waiter->list);
/*
* Ensure that the last operation is setting the reader
* waiter to nil such that rwsem_down_read_failed() cannot
@@ -215,13 +204,16 @@ __rwsem_mark_wake(struct rw_semaphore *sem,
* to the task to wakeup.
*/
smp_store_release(&waiter->task, NULL);
- } while (--loop);
+ }

- sem->wait_list.next = next;
- next->prev = &sem->wait_list;
+ adjustment = woken * RWSEM_ACTIVE_READ_BIAS - adjustment;
+ if (list_empty(&sem->wait_list)) {
+ /* hit end of list above */
+ adjustment -= RWSEM_WAITING_BIAS;
+ }

- out:
- return sem;
+ if (adjustment)
+ atomic_long_add(adjustment, &sem->count);
}

/*
@@ -235,7 +227,6 @@ struct rw_semaphore __sched *rwsem_down_read_failed(struct rw_semaphore *sem)
struct task_struct *tsk = current;
WAKE_Q(wake_q);

- /* set up my own style of waitqueue */
waiter.task = tsk;
waiter.type = RWSEM_WAITING_FOR_READ;

@@ -247,7 +238,8 @@ struct rw_semaphore __sched *rwsem_down_read_failed(struct rw_semaphore *sem)
/* we're now waiting on the lock, but no longer actively locking */
count = atomic_long_add_return(adjustment, &sem->count);

- /* If there are no active locks, wake the front queued process(es).
+ /*
+ * If there are no active locks, wake the front queued process(es).
*
* If there are no writers and we are first in the queue,
* wake our own waiter to join the existing active readers !
@@ -255,7 +247,7 @@ struct rw_semaphore __sched *rwsem_down_read_failed(struct rw_semaphore *sem)
if (count == RWSEM_WAITING_BIAS ||
(count > RWSEM_WAITING_BIAS &&
adjustment != -RWSEM_ACTIVE_READ_BIAS))
- sem = __rwsem_mark_wake(sem, RWSEM_WAKE_ANY, &wake_q);
+ __rwsem_mark_wake(sem, RWSEM_WAKE_ANY, &wake_q);

raw_spin_unlock_irq(&sem->wait_lock);
wake_up_q(&wake_q);
@@ -505,7 +497,7 @@ __rwsem_down_write_failed_common(struct rw_semaphore *sem, int state)
if (count > RWSEM_WAITING_BIAS) {
WAKE_Q(wake_q);

- sem = __rwsem_mark_wake(sem, RWSEM_WAKE_READERS, &wake_q);
+ __rwsem_mark_wake(sem, RWSEM_WAKE_READERS, &wake_q);
/*
* The wakeup is normally called _after_ the wait_lock
* is released, but given that we are proactively waking
@@ -614,9 +606,8 @@ struct rw_semaphore *rwsem_wake(struct rw_semaphore *sem)
raw_spin_lock_irqsave(&sem->wait_lock, flags);
locked:

- /* do nothing if list empty */
if (!list_empty(&sem->wait_list))
- sem = __rwsem_mark_wake(sem, RWSEM_WAKE_ANY, &wake_q);
+ __rwsem_mark_wake(sem, RWSEM_WAKE_ANY, &wake_q);

raw_spin_unlock_irqrestore(&sem->wait_lock, flags);
wake_up_q(&wake_q);
@@ -638,9 +629,8 @@ struct rw_semaphore *rwsem_downgrade_wake(struct rw_semaphore *sem)

raw_spin_lock_irqsave(&sem->wait_lock, flags);

- /* do nothing if list empty */
if (!list_empty(&sem->wait_list))
- sem = __rwsem_mark_wake(sem, RWSEM_WAKE_READ_OWNED, &wake_q);
+ __rwsem_mark_wake(sem, RWSEM_WAKE_READ_OWNED, &wake_q);

raw_spin_unlock_irqrestore(&sem->wait_lock, flags);
wake_up_q(&wake_q);
diff --git a/kernel/rcu/sync.c b/kernel/rcu/sync.c
index be922c9f3d37..50d1861f7759 100644
--- a/kernel/rcu/sync.c
+++ b/kernel/rcu/sync.c
@@ -68,6 +68,8 @@ void rcu_sync_lockdep_assert(struct rcu_sync *rsp)
RCU_LOCKDEP_WARN(!gp_ops[rsp->gp_type].held(),
"suspicious rcu_sync_is_idle() usage");
}
+
+EXPORT_SYMBOL_GPL(rcu_sync_lockdep_assert);
#endif

/**
@@ -83,6 +85,18 @@ void rcu_sync_init(struct rcu_sync *rsp, enum rcu_sync_type type)
}

/**
+ * Must be called after rcu_sync_init() and before first use.
+ *
+ * Ensures rcu_sync_is_idle() returns false and rcu_sync_{enter,exit}()
+ * pairs turn into NO-OPs.
+ */
+void rcu_sync_enter_start(struct rcu_sync *rsp)
+{
+ rsp->gp_count++;
+ rsp->gp_state = GP_PASSED;
+}
+
+/**
* rcu_sync_enter() - Force readers onto slowpath
* @rsp: Pointer to rcu_sync structure to use for synchronization
*
diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index 4a1ca5f6da7e..ae6f41fb9cba 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -20,7 +20,6 @@
#include <linux/kallsyms.h>
#include <linux/smpboot.h>
#include <linux/atomic.h>
-#include <linux/lglock.h>
#include <linux/nmi.h>

/*
@@ -47,13 +46,9 @@ struct cpu_stopper {
static DEFINE_PER_CPU(struct cpu_stopper, cpu_stopper);
static bool stop_machine_initialized = false;

-/*
- * Avoids a race between stop_two_cpus and global stop_cpus, where
- * the stoppers could get queued up in reverse order, leading to
- * system deadlock. Using an lglock means stop_two_cpus remains
- * relatively cheap.
- */
-DEFINE_STATIC_LGLOCK(stop_cpus_lock);
+/* static data for stop_cpus */
+static DEFINE_MUTEX(stop_cpus_mutex);
+static bool stop_cpus_in_progress;

static void cpu_stop_init_done(struct cpu_stop_done *done, unsigned int nr_todo)
{
@@ -230,14 +225,26 @@ static int cpu_stop_queue_two_works(int cpu1, struct cpu_stop_work *work1,
struct cpu_stopper *stopper1 = per_cpu_ptr(&cpu_stopper, cpu1);
struct cpu_stopper *stopper2 = per_cpu_ptr(&cpu_stopper, cpu2);
int err;
-
- lg_double_lock(&stop_cpus_lock, cpu1, cpu2);
+retry:
spin_lock_irq(&stopper1->lock);
spin_lock_nested(&stopper2->lock, SINGLE_DEPTH_NESTING);

err = -ENOENT;
if (!stopper1->enabled || !stopper2->enabled)
goto unlock;
+ /*
+ * Ensure that if we race with __stop_cpus() the stoppers won't get
+ * queued up in reverse order leading to system deadlock.
+ *
+ * We can't miss stop_cpus_in_progress if queue_stop_cpus_work() has
+ * queued a work on cpu1 but not on cpu2, we hold both locks.
+ *
+ * It can be falsely true but it is safe to spin until it is cleared,
+ * queue_stop_cpus_work() does everything under preempt_disable().
+ */
+ err = -EDEADLK;
+ if (unlikely(stop_cpus_in_progress))
+ goto unlock;

err = 0;
__cpu_stop_queue_work(stopper1, work1);
@@ -245,8 +252,12 @@ static int cpu_stop_queue_two_works(int cpu1, struct cpu_stop_work *work1,
unlock:
spin_unlock(&stopper2->lock);
spin_unlock_irq(&stopper1->lock);
- lg_double_unlock(&stop_cpus_lock, cpu1, cpu2);

+ if (unlikely(err == -EDEADLK)) {
+ while (stop_cpus_in_progress)
+ cpu_relax();
+ goto retry;
+ }
return err;
}
/**
@@ -316,9 +327,6 @@ bool stop_one_cpu_nowait(unsigned int cpu, cpu_stop_fn_t fn, void *arg,
return cpu_stop_queue_work(cpu, work_buf);
}

-/* static data for stop_cpus */
-static DEFINE_MUTEX(stop_cpus_mutex);
-
static bool queue_stop_cpus_work(const struct cpumask *cpumask,
cpu_stop_fn_t fn, void *arg,
struct cpu_stop_done *done)
@@ -332,7 +340,8 @@ static bool queue_stop_cpus_work(const struct cpumask *cpumask,
* preempted by a stopper which might wait for other stoppers
* to enter @fn which can lead to deadlock.
*/
- lg_global_lock(&stop_cpus_lock);
+ preempt_disable();
+ stop_cpus_in_progress = true;
for_each_cpu(cpu, cpumask) {
work = &per_cpu(cpu_stopper.stop_work, cpu);
work->fn = fn;
@@ -341,7 +350,8 @@ static bool queue_stop_cpus_work(const struct cpumask *cpumask,
if (cpu_stop_queue_work(cpu, work))
queued = true;
}
- lg_global_unlock(&stop_cpus_lock);
+ stop_cpus_in_progress = false;
+ preempt_enable();

return queued;
}