Re: linux-next oops in __lock_acquire for process_one_work

From: Peter Zijlstra
Date: Tue May 08 2012 - 09:03:50 EST


On Mon, 2012-05-07 at 10:57 -0700, Tejun Heo wrote:
> (cc'ing Peter and Ingo and quoting whole body)
>
> On Mon, May 07, 2012 at 10:19:09AM -0700, Hugh Dickins wrote:
> > Running MM load on recent linux-nexts (e.g. 3.4.0-rc5-next-20120504),
> > with CONFIG_PROVE_LOCKING=y, I've been hitting an oops in __lock_acquire
> > called from lock_acquire called from process_one_work: serving mm/swap.c's
> > lru_add_drain_all - schedule_on_each_cpu(lru_add_drain_per_cpu).
> >
> > In each case the oopsing address has been ffffffff00000198, and the
> > oopsing instruction is the "atomic_inc((atomic_t *)&class->ops)" in
> > __lock_acquire: so class is ffffffff00000000.
> >
> > I notice Stephen's commit 0976dfc1d0cd80a4e9dfaf87bd8744612bde475a
> > workqueue: Catch more locking problems with flush_work()
> > in linux-next but not 3.4-rc, adding
> > lock_map_acquire(&work->lockdep_map);
> > lock_map_release(&work->lockdep_map);
> > to flush_work.
> >
> > I believe that occasionally races with your
> > struct lockdep_map lockdep_map = work->lockdep_map;
> > in process_one_work, putting an entry into the class_cache
> > just as you're copying it, so you end up with half a pointer.
> > yes, the structure copy is using "rep movsl" not "rep movsq".

But the copy is copying from work->lockdep_map, not to, so it doesn't
matter does it? If anything would explode it would be the:

lock_map_acquire(&lockdep_map);

because that's the target of the copy and could indeed observe a partial
update (assuming the reported but silly "rep movsl").

> The offending commit is 0976dfc1d0cd80a4e9dfaf87bd8744612bde475a
> "workqueue: Catch more locking problems with flush_work()". It sounds
> fancy but all it does is adding the following to flush_work().
>
> lock_map_acquire(&work->lockdep_map);
> lock_map_release(&work->lockdep_map);
>
> Which seems correct to me and more importantly not different from what
> wait_on_work() does, so if this is broken, flush_work_sync() and
> cancel_work_sync() are broken too - probably masked by lower usage
> frequency.
>
> It seems the problem stems from how process_one_work() "caches"
> lockdep_map. This part predates cmwq changes but it seems necessary
> because the work item may be freed during execution but lockdep_map
> should be released after execution is complete.

Exactly.

> Peter, do you
> remember how this lockdep_map copying is added? Is (or was) this
> correct? If it's broken, how do we fix it? Add a lockdep_map copy
> API which does some magic lockdep locking dancing?

I think there's a problem if indeed we do silly things like small copies
like Hugh saw (why would gcc ever generate small copies for objects that
are naturally aligned and naturally sized?).

Something like the below should fix that problem, but it doesn't explain
the observed issue..

---
include/linux/lockdep.h | 19 +++++++++++++++++++
kernel/timer.c | 2 +-
kernel/workqueue.c | 2 +-
3 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index d36619e..dc6661b 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -157,6 +157,25 @@ struct lockdep_map {
#endif
};

+static inline struct lockdep_map lockdep_copy_map(struct lockdep_map *lock)
+{
+ struct lockdep_map _lock = *lock;
+ int i;
+
+ /*
+ * Since the class cache can be modified concurrently we could observe
+ * half pointers (64bit arch using 32bit copy insns). Therefore clear
+ * the caches and take the performance hit.
+ *
+ * XXX it doesn't work well with lockdep_set_class_and_subclass(), since
+ * that relies on cache abuse.
+ */
+ for (i = 0; i < NR_LOCKDEP_CACHING_CLASSES; i++)
+ _lock.class_cache[i] = NULL;
+
+ return _lock;
+}
+
/*
* Every lock has a list of other locks that were taken after it.
* We only grow the list, never remove from it:
diff --git a/kernel/timer.c b/kernel/timer.c
index a297ffc..fa98821 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -1102,7 +1102,7 @@ static void call_timer_fn(struct timer_list *timer, void (*fn)(unsigned long),
* warnings as well as problems when looking into
* timer->lockdep_map, make a copy and use that here.
*/
- struct lockdep_map lockdep_map = timer->lockdep_map;
+ struct lockdep_map lockdep_map = lockdep_map_copy(&timer->lockdep_map);
#endif
/*
* Couple the lock chain with the lock chain at
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 5abf42f..5d92b43 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1810,7 +1810,7 @@ __acquires(&gcwq->lock)
* lock freed" warnings as well as problems when looking into
* work->lockdep_map, make a copy and use that here.
*/
- struct lockdep_map lockdep_map = work->lockdep_map;
+ struct lockdep_map lockdep_map = lockdep_copy_map(&work->lockdep_map);
#endif
/*
* A single work shouldn't be executed concurrently by

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/