Re: [PATCH] vmpressure: fix divide-by-0 in vmpressure_work_fn

From: Michal Hocko
Date: Thu Sep 12 2013 - 07:46:08 EST


On Wed 11-09-13 13:04:33, Hugh Dickins wrote:
> On Wed, 11 Sep 2013, Michal Hocko wrote:
[...]
> > From 888745909da34f8aee8a208a82d467236b828d0d Mon Sep 17 00:00:00 2001
> > From: Michal Hocko <mhocko@xxxxxxx>
> > Date: Wed, 11 Sep 2013 17:48:10 +0200
> > Subject: [PATCH] vmpressure: fix divide-by-0 in vmpressure_work_fn
> >
> > Hugh Dickins has reported a division by 0 when a vmpressure event is
> > processed. The reason for the exception is that a single vmpressure
> > work item (which is per memcg) might be processed by multiple CPUs
> > because it is enqueued on system_wq which is !WQ_NON_REENTRANT.
> > This means that the out of lock vmpr->scanned check in
> > vmpressure_work_fn is inherently racy and the racing workers will see
> > already zeroed scanned value after they manage to take the spin lock.
> >
> > The patch simply moves the vmp->scanned check inside the sr_lock to fix
> > the race.
> >
> > The issue was there since the very beginning but "vmpressure: change
> > vmpressure::sr_lock to spinlock" might have made it more visible as the
> > racing workers would sleep on the mutex and give it more time to see
> > updated value. The issue was still there, though.
> >
> > Reported-by: Hugh Dickins <hughd@xxxxxxxxxx>
> > Signed-off-by: Michal Hocko <mhocko@xxxxxxx>
> > Cc: stable@xxxxxxxxxxxxxxx
>
> Nack! But equally Nack to my original.
>
> Many thanks for looking into how this might have happened, Michal,
> and for mentioning the WQ_NON_REENTRANT flag: which I knew nothing
> about, but have now followed up.
> I owe you all an abject apology: what I didn't mention in my patch
> was that actually I hit the problem on a v3.3-based kernel to which
> vmpressure had been backported.
>
> I have not yet seen the problem on v3.11 or v3.10, and now believe
> that it cannot happen there - which would explain why I was the
> first to hit it.
>
> When I looked up WQ_NON_REENTRANT in the latest tree, I found
> WQ_NON_REENTRANT = 1 << 0, /* DEPRECATED */
> and git blame on that line leads to Tejun explaining
>
> dbf2576e37 ("workqueue: make all workqueues non-reentrant") made
> WQ_NON_REENTRANT no-op but the following patches didn't remove the
> flag or update the documentation. Let's mark the flag deprecated and
> update the documentation accordingly.

Goon point. I didn't check the code and relied on the documentation.
Thanks for pointing this out.

> dbf2576e37 went into v3.7, so I now believe this divide-by-0 could
> only happen on a backport of vmpressure to an earlier kernel than that.

git grep WQ_NON_REENTRANT on kernel/workqueue.c really shows nothing so
I guess you are right.

Andrew, please drop the patch.
--
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/