Re: [PATCH kernel] rcu: Define lockless version of list_for_each_entry_rcu

From: Steven Rostedt
Date: Tue Nov 03 2015 - 09:39:39 EST


On Tue, 3 Nov 2015 17:57:05 +1100
Alexey Kardashevskiy <aik@xxxxxxxxx> wrote:

> This defines list_for_each_entry_lockless. This allows safe list
> traversing in cases when lockdep() invocation is unwanted like
> real mode (MMU is off).
>
> Signed-off-by: Alexey Kardashevskiy <aik@xxxxxxxxx>
> ---
>
> This is for VFIO acceleration in POWERKVM for pSeries guests.
> There is a KVM instance. There also can be some VFIO (PCI passthru)
> devices attached to a KVM guest.
>
> To perform DMA, a pSeries guest registers DMA memory by calling some
> hypercalls explicitely at the rate close to one-two hcalls per
> a network packet, i.e. very often. When a guest does a hypercall
> (which is just an assembly instruction), the host kernel receives it
> in the real mode (MMU is off). When real mode fails to handle it,
> it enables MMU and tries handling a hcall in virtual mode.
>
> A logical bus ID (LIOBN) is a tagret id for these hypecalls.
>
> Each VFIO device belongs to an IOMMU group. Each group has an address
> translation table. It is allowed to have multiple IOMMU groups (i.e.
> multiple tables) under the same LIOBN.
>
> So effectively every DMA hcall has to update one or more TCE tables
> attached to the same LIOBN. RCU is used to update/traverse this list
> safely.
>
> Using RCU as is in virtual mode is fine. Lockdep works, etc.
> list_add_rcu() is used to populate the list;
> list_del_rcu() + call_rcu() used to remove groups from a list.
> These operations can happen in runtim as a result of PCI hotplug/unplug
> in guests.
>
> Using RCU as is in real mode is not fine as some RCU checks can lock up
> the system and in real mode we won't even have a chance to see any
> debug. This is why rcu_read_lock() and rcu_read_unlock() are NOT used.
>
> Previous version of this used to define list_for_each_entry_rcu_notrace()
> but it was proposed to use list_entry_lockless() instead. However
> the comment for lockless_dereference() suggests this is a good idea
> if "lifetime is managed by something other than RCU" but it is in my case.
>
> So what would be the correct approach here? Thanks.

If the only use case for this so far is in POWERKVM, perhaps it should
be defined specifically (and in arch/powerpc) and not confuse others
about using this.

Or, if you do imagine that this can be used in other scenarios, then a
much deeper comment must be made in the code in the kerneldoc section.
list_for_each_entry_rcu() should really be used in 99.99% of the time
in the kernel. This looks to be an extreme exception. I hate to add a
generic helper for something that will only be used in one location.

-- Steve


> ---
> include/linux/rculist.h | 16 ++++++++++++++++
> 1 file changed, 16 insertions(+)
>
> diff --git a/include/linux/rculist.h b/include/linux/rculist.h
> index 17c6b1f..a83a924 100644
> --- a/include/linux/rculist.h
> +++ b/include/linux/rculist.h
> @@ -308,6 +308,22 @@ static inline void list_splice_init_rcu(struct list_head *list,
> pos = list_entry_rcu(pos->member.next, typeof(*pos), member))
>
> /**
> + * list_for_each_entry_lockless - iterate over rcu list of given type
> + * @pos: the type * to use as a loop cursor.
> + * @head: the head for your list.
> + * @member: the name of the list_struct within the struct.
> + *
> + * This list-traversal primitive may safely run concurrently
> + */
> +#define list_entry_lockless(ptr, type, member) \
> + container_of((typeof(ptr))lockless_dereference(ptr), type, member)
> +
> +#define list_for_each_entry_lockless(pos, head, member) \
> + for (pos = list_entry_lockless((head)->next, typeof(*pos), member); \
> + &pos->member != (head); \
> + pos = list_entry_lockless(pos->member.next, typeof(*pos), member))
> +
> +/**
> * list_for_each_entry_continue_rcu - continue iteration over list of given type
> * @pos: the type * to use as a loop cursor.
> * @head: the head for your list.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/