Re: Subject: [RFC MM] mmap_sem scaling: Use mutex and percpu counter instead

From: Andi Kleen
Date: Thu Nov 05 2009 - 15:56:25 EST


Christoph Lameter <cl@xxxxxxxxxxxxxxxxxxxx> writes:

> Instead of a rw semaphore use a mutex and a per cpu counter for the number
> of the current readers. read locking then becomes very cheap requiring only
> the increment of a per cpu counter.
>
> Write locking is more expensive since the writer must scan the percpu array
> and wait until all readers are complete. Since the readers are not holding
> semaphores we have no wait queue from which the writer could wakeup. In this
> draft we simply wait for one millisecond between scans of the percpu
> array. A different solution must be found there.

I'm not sure making all writers more expensive is really a good idea.

For example it will definitely impact the AIM7 multi brk() issue
or the mysql allocation case, which are all writer intensive. I assume
doing a lot of mmaps/brks in parallel is not that uncommon.

My thinking was more that we simply need per VMA locking or
some other per larger address range locking. Unfortunately that
needs changes in a lot of users that mess with the VMA lists
(perhaps really needs some better abstractions for VMA list management
first)

That said also addressing the convoying issues in the current
semaphores would be a good idea, which is what your patch does.

-Andi

--
ak@xxxxxxxxxxxxxxx -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/