Re: [PATCH v9 5/5] lib/dlock-list: Scale dlock_lists_empty()

From: Davidlohr Bueso
Date: Tue Oct 16 2018 - 22:52:02 EST


On Thu, 04 Oct 2018, Waiman Long wrote:

On 10/04/2018 03:16 AM, Jan Kara wrote:
On Wed 12-09-18 15:28:52, Waiman Long wrote:
From: Davidlohr Bueso <dave@xxxxxxxxxxxx>

Instead of the current O(N) implementation, at the cost
of adding an atomic counter, we can convert the call to
an atomic_read(). The counter only serves for accounting
empty to non-empty transitions, and vice versa; therefore
only modified twice for each of the lists during the
lifetime of the dlock (while used).

In addition, to be able to unaccount a list_del(), we
add a dlist pointer to each head, thus minimizing the
overall memory footprint.

Signed-off-by: Davidlohr Bueso <dbueso@xxxxxxx>
Acked-by: Waiman Long <longman@xxxxxxxxxx>
So I was wondering: Is this really worth it? AFAICS we have a single call
site for dlock_lists_empty() and that happens during umount where we don't
really care about this optimization. So it seems like unnecessary
complication to me at this point? If someone comes up with a usecase that
needs fast dlock_lists_empty(), then sure, we can do this...


Yes, that is true. We can skip this patch for the time being until a use
case comes up which requires dlock_lists_empty() to be used in the fast
path.

So fyi I ended up porting the epoll ready-list to this api, where
dlock_lists_empty() performance _does_ matter. However, the list
iteration is common enough operation to put perform the benefits of
the percpu add/delete operations. For example, when sending ready
events to userspace (ep_send_events_proc()), each item must drop the
iter lock, and also do a delete operation -- similarly for checking
for ready events (ep_read_events_proc). This ends hurting more than
benefiting workloads.

Anyway, so yeah I have no need for this patch, and the added complexity +
atomics is unjustified.

Thanks,
Davidlohr