[PATCH 00/10] V3: rwsem changes + down_read_critical() proposal

From: Michel Lespinasse
Date: Mon May 17 2010 - 18:27:34 EST

This is version 3 of my rwsem changes. Patches 7 and 10 were modified to
address Linus's comments about the API. Please consider for merging.

Changes since V2:

- Rebased to 2.6.34

- Changed patch 07 to address Linus's comments about the API.
down_read_critical() and up_read_critical() now work as a pair; threads
using this are allowed to skip over blocked threads when acquiring the
read lock; however they must make sure to quickly release that lock and
in particular, they are forbidden to block.

- Changed patch 10 to make use of the down_read_critical()/up_read_critical()
API when accessing /proc/<pid>/exe and /proc/<pid>/maps files.
I excluded smaps and numa_maps files, which can actually block while
being generated (smaps blocks in smaps_pte_range() doing a cond_resched(),
which seems legitimate as it's a potentially heavy operation. numa_maps
blocks in show_numa_map() doing a bzalloc of struct numa_maps, which
should probably get done in do_maps_open() instead).

The motivation for this change was some cluster monitoring software we
use at google; which reads /proc/<pid>/maps files for all running
processes. When the machines are under load, the mmap_sem is often
acquire for reads for long periods of time since do_page_fault() holds
it while doing disk accesses; and fair queueing behavior often ends up
in the monitoring software making little progress. By introducing
unfair behavior in a few selected places, are are able to let the
monitoring software make progress without impacting performance for
the rest of the system. I've made sure not to change the rwsem fast
paths in implementing this proposal.

Michel Lespinasse (10):
x86 rwsem: minor cleanups
rwsem: fully separate code pathes to wake writers vs readers
rwsem: lighter active count checks when waking up readers
rwsem: let RWSEM_WAITING_BIAS represent any number of waiting threads
rwsem: wake queued readers when writer blocks on active read lock
rwsem: smaller wrappers around rwsem_down_failed_common
generic rwsem: implement down_read_critical() / up_read_critical()
rwsem: down_read_critical infrastructure support
x86 rwsem: down_read_critical implementation
Use down_read_critical() for /sys/<pid>/exe and /sys/<pid>/maps files

arch/x86/include/asm/rwsem.h | 70 ++++++++++++-----
arch/x86/lib/rwsem_64.S | 14 +++-
arch/x86/lib/semaphore_32.S | 21 +++++-
fs/proc/base.c | 4 +-
fs/proc/task_mmu.c | 24 ++++--
include/linux/proc_fs.h | 1 +
include/linux/rwsem-spinlock.h | 10 ++-
include/linux/rwsem.h | 12 +++
kernel/rwsem.c | 35 +++++++++
lib/rwsem-spinlock.c | 10 ++-
lib/rwsem.c | 160 ++++++++++++++++++++++++++--------------
11 files changed, 266 insertions(+), 95 deletions(-)
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/