[PATCH v2 0/4] ipc: reduce ipc lock contention

From: Davidlohr Bueso
Date: Tue Mar 05 2013 - 04:36:01 EST


Hi,

The following set of patches are based on the discussion of holding the
ipc lock unnecessarily, such as for permissions and security checks:

https://lkml.org/lkml/2013/2/28/540

Patch 1/4: Remove the bogus comment from ipc_checkid() requiring that
the ipc lock be held before calling it. Also simplify the function
return. This is a new patch, not present in the RFC.

Patch 2/4: Introduce functions to obtain the ipc object without holding
the lock. Two functions, ipc_obtain_object() and
ipc_obtained_object_check() are created, which are analogous to
ipc_lock() and ipc_lock_check(). This patch was acked by Michel
Lespinasse and reviewed by Chegu Vinod.

Patch 3/4: Introduce ipcctl_pre_down_nolock() function, which is a
lockless version of ipcctl_pre_down(). This function is common to sem,
msg and shm and does some common checking for IPC_RMID and IPC_SET
commands. The older version was kept but calls the lockless version
without breaking the semantics, and is hence transparent to users. This
was suggested by Linus. Once all users are updated, the
ipcctl_pre_down() function can be removed.

Patch 4/4: Use the new, lockless, functions introduced above to only
hold the ipc lock when necessary. The idea is simple: only check ipc
security and permissions within the rcu read region, *without* holding
the ipc lock. This patch was acked by Michel Lespinasse and reviewed by
Chegu Vinod.

Changes since v1 (RFC):
- Add patches 1 and 3.

- Patch 2: In ipc_lock(), instead of checking the return of
ipc_obtain_object_check() against NULL, use IS_ERR(). Suggested by
Michel Lespinasse.

- Patch 2,4: In order for the rcu read lock/unlock calls to be paired up
more obviously, force the user to call rcu_read_unlock *before* calling
ipc_obtain_object[_check](). Suggested by Michel Lespinasse.

- Patch 4: Return ERR_CAST() in sem_obtain_object[_check]() instead of a
cast to struct sem_array *. Suggested by Linus.

- Patch 4: Change open coded spin_lock calls to ipc_object_lock in
semaphore code. Suggested by Linus.

- Patch 4: Added a 'out_wakup' label to semctl_main() and semtimedop()
to return from the functions without having to call sem_unlock (and
hence spin_unlock) without having the lock held.

- More tests: For the past few days I've been running this patchset on
my own laptop, and a 2 and 8 socket machines running my Oracle
swinbbench workloads. I have not encountered any issues so far. The main
fix was suggested by Linus with the bogus ipcctl_pre_down() changes
without updating the callers.

Ok, some numbers...

1) With Rik's semop-multi.c microbenchmark we can see the following
results:

Baseline (3.9-rc1):
cpus 4, threads: 256, semaphores: 128, test duration: 30 secs
total operations: 151452270, ops/sec 5048409

+ 59.40% a.out [kernel.kallsyms] [k] _raw_spin_lock
+ 6.14% a.out [kernel.kallsyms] [k] sys_semtimedop
+ 3.84% a.out [kernel.kallsyms] [k] avc_has_perm_flags
+ 3.64% a.out [kernel.kallsyms] [k] __audit_syscall_exit
+ 2.06% a.out [kernel.kallsyms] [k] copy_user_enhanced_fast_string
+ 1.86% a.out [kernel.kallsyms] [k] ipc_lock

With this patchset:
cpus 4, threads: 256, semaphores: 128, test duration: 30 secs
total operations: 273156400, ops/sec 9105213

+ 18.54% a.out [kernel.kallsyms] [k] _raw_spin_lock
+ 11.72% a.out [kernel.kallsyms] [k] sys_semtimedop
+ 7.70% a.out [kernel.kallsyms] [k] ipc_has_perm.isra.21
+ 6.58% a.out [kernel.kallsyms] [k] avc_has_perm_flags
+ 6.54% a.out [kernel.kallsyms] [k] __audit_syscall_exit
+ 4.71% a.out [kernel.kallsyms] [k] ipc_obtain_object_check


2) While on an Oracle swingbench DSS (data mining) workload the
improvements are not as exciting as with Rik's benchmark, we can see
some positive numbers. For an 8 socket machine the following are the
percentages of %sys time incurred in the ipc lock:

Baseline (3.9-rc1):
100 swingbench users: 8,74%
400 swingbench users: 21,86%
800 swingbench users: 84,35%

With this patchset:
100 swingbench users: 8,11%
400 swingbench users: 19,93%
800 swingbench users: 77,69%

Thanks,
Davidlohr




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/