On Thu, 2008-11-27 at 01:28 -0800, Mike Waychison wrote:
Correct. I don't recall the numbers from the pathelogical cases we were seeing, but iirc, it was on the order of 10s of seconds, likely exascerbated by slower than usual disks. I've been digging through my inbox to find numbers without much success -- we've been using a variant of this patch since 2.6.11.
We generally try to avoid such things, but sometimes it a) can't be easily avoided (third party libraries for instance) and b) when it hits us, it affects the overall health of the machine/cluster (the monitoring daemons get blocked, which isn't very healthy).
If its only monitoring, there might be another solution. If you can keep
the required data in a separate (approximate) copy so that you don't
need mmap_sem at all to show them.
If your mmap_sem is so contended your latencies are unacceptable, adding
more users to it - even statistics gathering, just isn't going to cure
the situation.
Furthermore, /proc code usually isn't written with performance in mind,
so its usually simple and robust code. Adding it to a 'hot'-path like
you're doing doesn't seem advisable.
Also, releasing and re-acquiring mmap_sem can significantly add to the
cacheline bouncing that thing already has.