Re: [PATCH v3 1/3] lib/percpu-list: Per-cpu list with associated per-cpu locks

From: Waiman Long
Date: Wed Feb 24 2016 - 14:51:44 EST


On 02/24/2016 02:56 AM, Jan Kara wrote:
On Tue 23-02-16 14:04:30, Waiman Long wrote:
Linked list is used everywhere in the Linux kernel. However, if many
threads are trying to add or delete entries into the same linked list,
it can create a performance bottleneck.

This patch introduces a new per-cpu list subystem with associated
per-cpu locks for protecting each of the lists individually. This
allows list entries insertion and deletion operations to happen in
parallel instead of being serialized with a global list and lock.

List entry insertion is strictly per cpu. List deletion, however, can
happen in a cpu other than the one that did the insertion. So we still
need lock to protect the list. Because of that, there may still be
a small amount of contention when deletion is being done.

A new header file include/linux/percpu-list.h will be added with the
associated pcpu_list_head and pcpu_list_node structures. The following
functions are provided to manage the per-cpu list:

1. int init_pcpu_list_head(struct pcpu_list_head **ppcpu_head)
2. void pcpu_list_add(struct pcpu_list_node *node,
struct pcpu_list_head *head)
3. void pcpu_list_del(struct pcpu_list *node)

Iteration of all the list entries within a group of per-cpu
lists is done by calling either the pcpu_list_iterate() or
pcpu_list_iterate_safe() functions in a while loop. They correspond
to the list_for_each_entry() and list_for_each_entry_safe() macros
respectively. The iteration states are keep in a pcpu_list_state
structure that is passed to the iteration functions.

Signed-off-by: Waiman Long<Waiman.Long@xxxxxxx>
Two comments below.

+/*
+ * Helper function to find the first entry of the next per-cpu list
+ * It works somewhat like for_each_possible_cpu(cpu).
+ *
+ * Return: true if the entry is found, false if all the lists exhausted
+ */
+static __always_inline bool
+__pcpu_list_next_cpu(struct pcpu_list_head *head, struct pcpu_list_state *state)
+{
+ if (state->lock)
+ spin_unlock(state->lock);
+next_cpu:
+ /*
+ * for_each_possible_cpu(cpu)
+ */
+ state->cpu = cpumask_next(state->cpu, cpu_possible_mask);
+ if (state->cpu>= nr_cpu_ids)
+ return false; /* All the per-cpu lists iterated */
+
+ state->head =&per_cpu_ptr(head, state->cpu)->list;
+ state->lock =&per_cpu_ptr(head, state->cpu)->lock;
+ state->curr = list_entry(state->head->next,
+ struct pcpu_list_node, list);
+ if (&state->curr->list == state->head)
+ goto next_cpu;
This might be more comprehensible as:

if (list_empty(state->head))
goto next_cpu;

and you can do it just after updating state->head (no need to init
state->lock& state->curr if the list is empty).

Thank for the suggestion. Will change the code accordingly.
Another note: Initialization of state->curr is IMO racy - you need to hold
state->lock to grab state->curr reliably, don't you? Otherwise someone can
remove the entry while you are working with it. So you need to move that
down just before returning.

Right. I will move the initialization of state->curr after the spin_lock().

+
+ spin_lock(state->lock);
+ return true;
+}
+#endif /* NR_CPUS == 1 */
...

+/*
+ * Delete a node from a percpu list
+ *
+ * We need to check the lock pointer again after taking the lock to guard
+ * against concurrent delete of the same node. If the lock pointer changes
+ * (becomes NULL or to a different one), we assume that the deletion was done
+ * elsewhere.
+ */
+void pcpu_list_del(struct pcpu_list_node *node)
+{
+ spinlock_t *lock = READ_ONCE(node->lockptr);
+
+ if (unlikely(!lock))
+ return;
+
+ spin_lock(lock);
+ if (likely(lock == node->lockptr)) {
+ list_del_init(&node->list);
+ node->lockptr = NULL;
+ }
But someone changing lockptr under your hands would mean that there are
two processes racing to remove entries and that would generally point to a
problem (and likely use-after-free) in the caller, won't it? Or do you have
some particular usecase in mind?

Honza


This is just defensive programming to guard against unforeseen case. I don't have any particular use case in mind that will make that happen. Maybe I should put a WARN_ON if this really happens.

Cheers,
Longman