Re: [PATCH v3 1/4] lib/dlock-list: Distributed and lock-protected lists

From: Waiman Long
Date: Tue Jul 19 2016 - 14:42:52 EST


On 07/18/2016 07:38 PM, Tejun Heo wrote:
Hello, Waiman.

On Fri, Jul 15, 2016 at 01:39:40PM -0400, Waiman Long wrote:
Suggested-by: Tejun Heo<tj@xxxxxxxxxx>
Not sure I should be on suggested-by given that this wasn't my idea at
all.

I put the tag there because of your suggestion to restructure the data structure which did make the patch better. I can remove that tag if you think it is not appropriate.

+/*
+ * include/linux/dlock-list.h
+ *
+ * A distributed (per-cpu) set of lists each of which is protected by its
+ * own spinlock, but acts like a single consolidated list to the callers.
+ *
+ * The dlock_list_head_percpu structure contains the spinlock, the other
+ * dlock_list_node structures only contains a pointer to the spinlock in
+ * dlock_list_head_percpu.
+ */
The more I think about it, the more bothered I'm about the dlock_list
name. For the most part, this isn't different from other percpu data
structures in the kernel. Sure, it might benefit from doing Nth cpu,
but so are other percpu data structures and it's not just "distributed
lock" list either. The list itself is percpu, not just locking. Can
we please go back to percpu_list? Christoph, what do you think?


As I said before, I don't mind reverting the name back to percpu_list. I am just waiting for a final agreement.

+struct dlock_list_node {
+ struct list_head list;
+ spinlock_t *lockptr;
+};
Wouldn't it be better to point to dlock_list_percpu?

I could. However, the only thing that matter is the spinlock that protects the list entry.

+#define DLOCK_LIST_HEAD_PERCPU_INIT(name) \
+ { \
+ .list.prev =&name.list, \
+ .list.next =&name.list, \
Use LIST_HEAD_INIT()? Also, why do we even need the initializers if
the data structure can only be dynamically allocated. In fact, do
the definitions even need to be exposed in the header?

I put it there for completeness sake. You are right. That macro isn't currently used. I will remove it in the next iteration of the patch.

+ .list.lock = __SPIN_LOCK_UNLOCKED(name), \
+ }
+
+/*
+ * dlock list iteration state
+ */
+struct dlock_list_iter {
+ int cpu;
^
I'm not sure lining up with space here is common in kernel.

OK, I will remove the extra spaces to make it more conformant to the kernel style.

+ spinlock_t *lock;
+ struct list_head *head; /* List head of current per-cpu list */
+ struct dlock_list_node *curr;
+ struct dlock_list_node *next;
+};
+
+#define DLOCK_LIST_ITER_INIT() \
^
Don't we usually omit () in these cases?

Good point. Will remove the unneeded ().

+ { \
+ .cpu = -1, \
+ }
+
+#define DEFINE_DLOCK_LIST_ITER(s) \
+ struct dlock_list_iter s = DLOCK_LIST_ITER_INIT()
+
+static inline void init_dlock_list_iter(struct dlock_list_iter *iter)
+{
+ *iter = (struct dlock_list_iter)DLOCK_LIST_ITER_INIT();
+}
+
+#define DLOCK_LIST_NODE_INIT(name) \
+ { \
+ .list.prev =&name.list, \
+ .list.next =&name.list, \
^
LIST_HEAD_INIT()?

Will make the change.

+ }
+
+static inline void init_dlock_list_node(struct dlock_list_node *node)
+{
+ INIT_LIST_HEAD(&node->list);
+ node->lockptr = NULL;
Why not use DLOCK_LIST_NODE_INIT()?


Yes, I can make the change.

+}
+
+/*
+ * Check if all the dlock lists are empty
+ *
+ * This can be a pretty expensive function call. If this function is required
+ * in a performance critical path, we may have to maintain a global count
+ * of the list entries in the global dlock_list_head structure instead.
+ */
/** function comment please.

Sure.

+static inline bool dlock_list_empty(struct dlock_list_head *dlist)
+{
+ int cpu;
+
+ for_each_possible_cpu(cpu)
+ if (!list_empty(&per_cpu_ptr(dlist->head, cpu)->list))
+ return false;
+ return true;
+}
+
+/*
+ * Allocation and freeing of dlock list
+ */
+extern int alloc_dlock_list_head(struct dlock_list_head *dlist);
^
ditto with alignment

Will remove the extra space.

+extern void free_dlock_list_head(struct dlock_list_head *dlist);
+
+/*
+ * The dlock list iteration functions which return true if iteration has
+ * to be continued.
+ */
+extern bool dlock_list_next(struct dlock_list_head *dlist,
+ struct dlock_list_iter *iter);
+extern bool dlock_list_next_safe(struct dlock_list_head *dlist,
+ struct dlock_list_iter *iter);
Why not return dlock_list_node * for the current node? That'd more
conventional and allows dlock_list_iter to be opaque.

Yes, I can make it return dlock_list_node *.

However, to make dlock_list_iter opaque, I will have to dynamically allocate the structure. That will add an extra memory allocation and free calls as well as handling the error case of running out of memory. I don't think that is worth doing at this point.

diff --git a/lib/dlock-list.c b/lib/dlock-list.c
new file mode 100644
index 0000000..af4a9f3
--- /dev/null
+++ b/lib/dlock-list.c
...
+int alloc_dlock_list_head(struct dlock_list_head *dlist)
+{
+ struct dlock_list_head dlist_tmp;
+ int cpu;
+
+ dlist_tmp.head = alloc_percpu(struct dlock_list_head_percpu);
+ if (!dlist_tmp.head)
+ return -ENOMEM;
+
+ for_each_possible_cpu(cpu) {
+ struct dlock_list_head_percpu *head;
+
+ head = per_cpu_ptr(dlist_tmp.head, cpu);
+ INIT_LIST_HEAD(&head->list);
+ head->lock = __SPIN_LOCK_UNLOCKED(&head->lock);
+ lockdep_set_class(&head->lock,&dlock_list_key);
+ }
+
+ dlist->head = dlist_tmp.head;
Just use dlist->head directly or use local __perpcu head pointer?

I just don't want to expose the structure to world until it is fully initialized. If you think I am over-cautious, I can use dlist->head as suggested.

+ return 0;
+}
+EXPORT_SYMBOL(alloc_dlock_list_head);
Does this actually need to be exported? If so, it might be a better
idea to start with EXPORT_SYMBOL_GPL().

For the current use case, we probably don't need to export the symbols. Other use cases may require that. I will change it to use the version instead.

+void dlock_list_add(struct dlock_list_node *node,
+ struct dlock_list_head *dlist)
+{
+ struct dlock_list_head_percpu *head;
^

This probably requires __percpu annotation. Have you run it through
sparse and checked for address space warnings?

You are right. I probably miss the __percpu annotation. I haven't run it through sparse. I will do that to fix any warning found.

+
+ /*
+ * Disable preemption to make sure that CPU won't gets changed.
+ */
+ head = get_cpu_ptr(dlist->head);
+ spin_lock(&head->lock);
+ node->lockptr =&head->lock;
+ list_add(&node->list,&head->list);
+ spin_unlock(&head->lock);
+ put_cpu_ptr(dlist->head);
+}
+EXPORT_SYMBOL(dlock_list_add);
+
+/**
+ * dlock_list_del - Delete a node from a dlock list
+ * @node : The node to be deleted
+ *
+ * We need to check the lock pointer again after taking the lock to guard
+ * against concurrent deletion of the same node. If the lock pointer changes
+ * (becomes NULL or to a different one), we assume that the deletion was done
+ * elsewhere. A warning will be printed if this happens as it is likely to be
+ * a bug.
+ */
+void dlock_list_del(struct dlock_list_node *node)
+{
+ spinlock_t *lock = READ_ONCE(node->lockptr);
+
+ if (unlikely(!lock)) {
+ WARN_ONCE(1,
+ "dlock_list_del: node 0x%lx has no associated lock\n",
+ (unsigned long)node);
Maybe "if (WARN_ONCE(!lock...)"? WARN_ONCE implies unlikely.

OK, will do that.

+ return;
+ }
+
+ spin_lock(lock);
+ if (likely(lock == node->lockptr)) {
+ list_del_init(&node->list);
+ node->lockptr = NULL;
+ } else {
+ /*
+ * This path should never be executed.
+ */
+ WARN_ON_ONCE(1);
+ }
This still kinda bothers me because this pretty much requires the
users to have strong synchronization around the operations and makes
it unusable in situations where opportunistic behaviors are
acceptable. It negates the usefulness quite a bit.

I understand your concern. I will make it retry again with the new lock.

+ spin_unlock(lock);
+}
+EXPORT_SYMBOL(dlock_list_del);
+
+/*
+ * Helper function to find the first entry of the next per-cpu list
+ * It works somewhat like for_each_possible_cpu(cpu).
+ *
+ * Return: true if the entry is found, false if all the lists exhausted
+ *
+ */
+static inline bool dlock_list_next_cpu(struct dlock_list_head *dlist,
^
Just let the compiler figure it out.

Sure. Will remove the inline tag.

+ struct dlock_list_iter *iter)
+{
+ if (iter->lock)
+ spin_unlock(iter->lock);
+next_cpu:
+ /*
+ * for_each_possible_cpu(cpu)
+ */
+ iter->cpu = cpumask_next(iter->cpu, cpu_possible_mask);
+ if (iter->cpu>= nr_cpu_ids)
+ return false; /* All the per-cpu lists iterated */
+
+ iter->head =&per_cpu_ptr(dlist->head, iter->cpu)->list;
+ if (list_empty(iter->head))
+ goto next_cpu;
+
+ iter->lock =&per_cpu_ptr(dlist->head, iter->cpu)->lock;
+ spin_lock(iter->lock);
+ /*
+ * There is a slight chance that the list may become empty just
+ * before the lock is acquired. So an additional check is
+ * needed to make sure that iter->curr points to a valid entry.
+ */
+ if (list_empty(iter->head)) {
+ spin_unlock(iter->lock);
+ goto next_cpu;
+ }
+ iter->curr = list_entry(iter->head->next,
+ struct dlock_list_node, list);
+ return true;
+}
...
+/**
+ * dlock_list_next_safe - Removal-safe iterator of dlock list
+ * @dlist: Pointer to the dlock_list_head structure
+ * @iter : Pointer to the dlock list iterator structure
+ * Return: true if the next entry is found, false if all the entries iterated
+ *
+ * The iterator has to be properly initialized before calling this function.
+ * This iteration function is safe with respect to list entry removal.
+ * However, it cannot correctly iterate newly added entries right after the
+ * current one.
+ */
This still looks wrong to me. If you want to provide the two variants
of iterations, can't you just implement one next function and build
the two types of iterations on top of it?

I have been thinking about making dlock_list_next_cpu() the real external function and have 2 inline functions that implement dlock_list_next() and dlock_list_next_safe(). That may strike a better balance between performance and code abstraction. I will do so if you have no objection to that.

Cheers,
Longman