Re: [PATCH] 3/4 combine RCU with seqlock to allow mmu notifiermethods to sleep (#v9 was 1/4)

From: Christoph Lameter
Date: Fri Mar 07 2008 - 15:00:58 EST


On Fri, 7 Mar 2008, Andrea Arcangeli wrote:

> This combines the non-sleep-capable RCU locking of #v9 with a seqlock
> so the mmu notifier fast path will require zero cacheline
> writes/bouncing while still providing mmu_notifier_unregister and
> allowing to schedule inside the mmu notifier methods. If we drop
> mmu_notifier_unregister we can as well drop all seqlock and
> rcu_read_lock()s. But this locking scheme combination is sexy enough
> and 100% scalable (the mmu_notifier_list cacheline will be preloaded
> anyway and that will most certainly include the sequence number value
> in l1 for free even in Christoph's NUMA systems) so IMHO it worth to
> keep mmu_notifier_unregister.

Well its adds lots of processing. Not sure if its really worth it. Seems
that this scheme cannot work since the existence of the structure passed
to the callbacks is not guaranteed since the RCU locks are not held. You
need some kind of a refcount to give the existence guarantee.

> diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
> --- a/mm/mmu_notifier.c
> +++ b/mm/mmu_notifier.c
> @@ -20,7 +20,9 @@ void __mmu_notifier_release(struct mm_st
> void __mmu_notifier_release(struct mm_struct *mm)
> {
> struct mmu_notifier *mn;
> + unsigned seq;
>
> + seq = read_seqbegin(&mm->mmu_notifier_lock);
> while (unlikely(!hlist_empty(&mm->mmu_notifier_list))) {
> mn = hlist_entry(mm->mmu_notifier_list.first,
> struct mmu_notifier,
> @@ -28,6 +30,7 @@ void __mmu_notifier_release(struct mm_st
> hlist_del(&mn->hlist);
> if (mn->ops->release)
> mn->ops->release(mn, mm);
> + BUG_ON(read_seqretry(&mm->mmu_notifier_lock, seq));
> }
> }

So this is only for sanity checking? The BUG_ON detects concurrent
operations that should not happen? Need a comment here.


> @@ -42,11 +45,19 @@ int __mmu_notifier_clear_flush_young(str
> struct mmu_notifier *mn;
> struct hlist_node *n;
> int young = 0;
> + unsigned seq;
>
> rcu_read_lock();
> +restart:
> + seq = read_seqbegin(&mm->mmu_notifier_lock);
> hlist_for_each_entry_rcu(mn, n, &mm->mmu_notifier_list, hlist) {
> - if (mn->ops->clear_flush_young)
> + if (mn->ops->clear_flush_young) {
> + rcu_read_unlock();
> young |= mn->ops->clear_flush_young(mn, mm, address);
> + rcu_read_lock();
> + }
> + if (read_seqretry(&mm->mmu_notifier_lock, seq))
> + goto restart;

Great innovative idea of the seqlock for versioning checks.

> }
> rcu_read_unlock();
>

Well that gets pretty sophisticated here. If you drop the rcu lock then
the entity pointed to by mn can go away right? So how can you pass that
structure to clear_flush_young? What is guaranteeing the existence of the
structure?


> @@ -58,11 +69,19 @@ void __mmu_notifier_invalidate_page(stru
> {
> struct mmu_notifier *mn;
> struct hlist_node *n;
> + unsigned seq;
>
> rcu_read_lock();
> +restart:
> + seq = read_seqbegin(&mm->mmu_notifier_lock);
> hlist_for_each_entry_rcu(mn, n, &mm->mmu_notifier_list, hlist) {
> - if (mn->ops->invalidate_page)
> + if (mn->ops->invalidate_page) {
> + rcu_read_unlock();
> mn->ops->invalidate_page(mn, mm, address);

Ditto structure can vanish since no existence guarantee exists.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/