Re: [BUG] mm: backing-dev: a possible sleep-in-atomic-context bug in cgwb_create()

From: Jan Kara
Date: Fri Jun 22 2018 - 04:50:45 EST


On Wed 20-06-18 20:35:15, Matthew Wilcox wrote:
> On Thu, Jun 21, 2018 at 11:02:58AM +0800, Jia-Ju Bai wrote:
> > The kernel may sleep with holding a spinlock.
> > The function call path (from bottom to top) in Linux-4.16.7 is:
> >
> > [FUNC] schedule
> > lib/percpu-refcount.c, 222:
> > schedule in __percpu_ref_switch_mode
> > lib/percpu-refcount.c, 339:
> > __percpu_ref_switch_mode in percpu_ref_kill_and_confirm
> > ./include/linux/percpu-refcount.h, 127:
> > percpu_ref_kill_and_confirm in percpu_ref_kill
> > mm/backing-dev.c, 545:
> > percpu_ref_kill in cgwb_kill
> > mm/backing-dev.c, 576:
> > cgwb_kill in cgwb_create
> > mm/backing-dev.c, 573:
> > _raw_spin_lock_irqsave in cgwb_create
> >
> > This bug is found by my static analysis tool (DSAC-2) and checked by my
> > code review.
>
> I disagree with your code review.
>
> * If the previous ATOMIC switching hasn't finished yet, wait for
> * its completion. If the caller ensures that ATOMIC switching
> * isn't in progress, this function can be called from any context.
>
> I believe cgwb_kill is always called under the spinlock, so we will never
> sleep because the percpu ref will never be switching to atomic mode.

You are right that the sleep under spinlock never happens. And the reason
is that percpu_ref_kill() never results in blocking - it does call
percpu_ref_kill_and_confirm() but the 'confirm' argument is NULL and thus
even percpu_ref_kill_and_confirm() never blocks.

Honza
--
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR