Re: [PATCH kernel] rcu: Define lockless version of list_for_each_entry_rcu

From: Paul E. McKenney
Date: Sat Dec 05 2015 - 21:19:11 EST


On Mon, Nov 30, 2015 at 12:30:06PM -0800, Paul E. McKenney wrote:
> On Mon, Nov 30, 2015 at 10:39:15AM +1100, Paul Mackerras wrote:
> > On Wed, Nov 18, 2015 at 11:13:28AM -0800, Paul E. McKenney wrote:
> > > On Fri, Nov 06, 2015 at 01:17:17PM +1100, Alexey Kardashevskiy wrote:
> > [snip]
> > > > Still, is my approach correct? What does the comment for
> > > > lockless_dereference() actally mean - it won't work together with
> > > > RCU at all or this is to force people not to use it as
> > > > "list_for_each_entry_rcu() should really be used in 99.99% of the
> > > > time"? :)
> > >
> > > Well, it depends...
> > >
> > > The key difference between lockless_dereference() and rcu_dereference()
> > > is that lockless_dereference() won't complain if used outside of
> > > an RCU read-side critical section. When there is no RCU read-side
> > > critical section, lockless_dereference() cannot rely on RCU's normal
> > > action of keeping the data item around. Therefore, when you are using
> > > lockless_dereference(), you have to have some mechanism other than RCU
> > > to keep the data item around. The usual approach is for the data item
> > > to never be freed, for example, if data is only ever added to the list
> > > and never removed. Other approaches include reference counting, hazard
> > > pointers, hashed arrays of locks, garbage collectors, transactional
> > > memory, and so on.
> >
> > So, the situation is that we have an RCU-protected list, which in this
> > case we are traversing without modifying the list. We are in an
> > restricted environment (hypervisor real mode) where we can't be
> > preempted, both because interrupts are hard-disabled and because our
> > caller has done preempt_disable(). In this restricted environment we
> > can access the linear mapping (thus kmalloc'd data) but not the
> > vmalloc or ioremap regions.
> >
> > Thus we are not formally in a RCU read-side critical section, though
> > we believe that having preemption disabled gives us equivalent
> > protection. Probably what we should do is to add a
> > rcu_read_lock/unlock pair in a function higher up the call chain
> > so that we are actually in a RCU read-side critical section.
> >
> > Then the only reason not to use list_for_each_entry_rcu would be that
> > we don't trust the checking machinery not to ever access vmalloc'd
> > data. In other words, we want a list_for_each_entry_nocheck
> > or list_for_each_entry_restricted which omits all the lockdep
> > checking. That would need a list_entry_rcu_nocheck which would need a
> > thing like rcu_dereference_raw that does no lockdep checking - which
> > is where I thought you suggested lockless_dereference.
> >
> > So, what name do you like for these primitives, and where should they
> > go?
>
> I am good with list_for_each_entry_lockless() and with its using
> lockless_dereference(). I am also good with this going into
> include/linux/rculist.h. That said...
>
> o The name of the macro was last given as list_entry_lockless()
> rather than the name list_for_each_entry_lockless() that was
> used in the docbook comment. The latter is preferable because
> it is consistent with the other list-traversal macros.
>
> o The docbook comment (and I quote) "This list-traversal
> primitive may safely run concurrently" is clearly inadequate.
> The points it needs to make are:
>
> 1. This locklessly traverses a list.
>
> 2. It may be used when items are never removed from the list.
>
> 3. It may also be used in environments where preemption
> is implicitly disabled, but where lockdep cannot safely
> be invoked. In this case, after an element is removed,
> the synchronize_sched(), synchronize_sched_expedited(),
> or call_rcu_sched() functions must be used to wait for
> the needed grace period.
>
> o The commit log is also inadequate, though given a good patch,
> I can fix this. It needs some of the information that we have
> been sending back and forth in this email thread.
>
> It feels to me like we are going around in circles here. Someone needs
> to submit an updated patch to move this forward: I cannot in good faith
> accept the November 3rd patch. Are you or Alexey going to send this
> patch, or should I produce my best guess as to what you guys need?

As in the following? (And yes, I was confused by the fact that the
docbook header was in front of a differently-named macro!)

Thanx, Paul

------------------------------------------------------------------------

commit 428ac14295ea49853afa136c9120cc6fb554a030
Author: Alexey Kardashevskiy <aik@xxxxxxxxx>
Date: Sat Dec 5 18:14:19 2015 -0800

list: Add lockless list traversal primitives

Although list_for_each_entry_rcu() can in theory be used anywhere
preemption is disabled, it can result in calls to lockdep, which cannot
be used in certain constrained execution environments, such as exception
handlers that do not map the entire kernel into their address spaces.
This commit therefore adds list_entry_lockless() and
list_for_each_entry_lockless(), which never invoke lockdep and can
therefore safely be used from these constrained environments, but only
as long as those environments are non-preemptible (or items are never
deleted from the list).

Use synchronize_sched(), call_rcu_sched(), or synchronize_sched_expedited()
in updates for the needed grace periods. Of course, if items are never
deleted from the list, there is no need to wait for grace periods.

Signed-off-by: Alexey Kardashevskiy <aik@xxxxxxxxx>
Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>

diff --git a/include/linux/rculist.h b/include/linux/rculist.h
index 5ed540986019..1fad79861e14 100644
--- a/include/linux/rculist.h
+++ b/include/linux/rculist.h
@@ -305,6 +305,42 @@ static inline void list_splice_init_rcu(struct list_head *list,
pos = list_entry_rcu(pos->member.next, typeof(*pos), member))

/**
+ * list_entry_lockless - get the struct for this entry
+ * @ptr: the &struct list_head pointer.
+ * @type: the type of the struct this is embedded in.
+ * @member: the name of the list_head within the struct.
+ *
+ * This primitive may safely run concurrently with the _rcu list-mutation
+ * primitives such as list_add_rcu(), but requires some implicit RCU
+ * read-side guarding. One example is running within a special
+ * exception-time environment where preemption is disabled and where
+ * lockdep cannot be invoked (in which case updaters must use RCU-sched,
+ * as in synchronize_sched(), call_rcu_sched(), and friends). Another
+ * example is when items are added to the list, but never deleted.
+ */
+#define list_entry_lockless(ptr, type, member) \
+ container_of((typeof(ptr))lockless_dereference(ptr), type, member)
+
+/**
+ * list_for_each_entry_lockless - iterate over rcu list of given type
+ * @pos: the type * to use as a loop cursor.
+ * @head: the head for your list.
+ * @member: the name of the list_struct within the struct.
+ *
+ * This primitive may safely run concurrently with the _rcu list-mutation
+ * primitives such as list_add_rcu(), but requires some implicit RCU
+ * read-side guarding. One example is running within a special
+ * exception-time environment where preemption is disabled and where
+ * lockdep cannot be invoked (in which case updaters must use RCU-sched,
+ * as in synchronize_sched(), call_rcu_sched(), and friends). Another
+ * example is when items are added to the list, but never deleted.
+ */
+#define list_for_each_entry_lockless(pos, head, member) \
+ for (pos = list_entry_lockless((head)->next, typeof(*pos), member); \
+ &pos->member != (head); \
+ pos = list_entry_lockless(pos->member.next, typeof(*pos), member))
+
+/**
* list_for_each_entry_continue_rcu - continue iteration over list of given type
* @pos: the type * to use as a loop cursor.
* @head: the head for your list.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/