Re: mm: GPF in bdi_put

From: Jan Kara
Date: Wed Mar 01 2017 - 10:06:33 EST


On Wed 01-03-17 15:29:09, Jan Kara wrote:
> On Mon 27-02-17 18:27:55, Al Viro wrote:
> > On Mon, Feb 27, 2017 at 06:11:11PM +0100, Dmitry Vyukov wrote:
> > > Hello,
> > >
> > > The following program triggers GPF in bdi_put:
> > > https://gist.githubusercontent.com/dvyukov/15b3e211f937ff6abc558724369066ce/raw/cc017edf57963e30175a6a6fe2b8d917f6e92899/gistfile1.txt
> >
> > What happens is
> > * attempt of, essentially, mount -t bdev ..., calls mount_pseudo()
> > and then promptly destroys the new instance it has created.
> > * the only inode created on that sucker (root directory, that
> > is) gets evicted.
> > * most of ->evict_inode() is harmless, until it gets to
> > if (bdev->bd_bdi != &noop_backing_dev_info)
> > bdi_put(bdev->bd_bdi);
>
> Thanks for the analysis!
>
> > added there by "block: Make blk_get_backing_dev_info() safe without open bdev".
> > Since ->bd_bdi hadn't been initialized for that sucker (the same patch has
> > placed initialization into bdget()), we step into shit of varying nastiness,
> > depending on phase of moon, etc.
>
> Yup, I've missed that the root inode of bdev superblock does not go through
> bdget() (in fact I didn't think what happens with root inode for bdev
> superblock at all) and thus bd_bdi is left uninitialized in that case. I'll
> send a fix for that in a while.
>
> > Could somebody explain WTF do we have those two lines in bdev_evict_inode(),
> > anyway? We set ->bd_bdi to something other than noop_backing_dev_info only
> > in __blkdev_get() when ->bd_openers goes from zero to positive, so why is
> > the matching bdi_put() not in __blkdev_put()? Jan?
>
> The problem is writeback code (from flusher work or through sync(2) -
> generally inode_to_bdi() users) can be looking at bdev inode independently
> from it being open. So if they start looking while the bdev is open but the
> dereference happens after it is closed and device removed, we oops. We have
> seen oopses due to this for quite a while. And all the stuff that is done
> in __blkdev_put() is not enough to prevent writeback code from having a
> look whether there is not something to write.
>
> So what we do now is that once we establish valid bd_bdi reference, we
> leave it alone until bdev inode gets evicted. And to handle the case when
> underlying device actually changes, we unhash bdev inode when the device
> gets removed from the system so that it cannot be found by bdget() anymore.

Attached patch fixes the problem for me. I'll post it officially tomorrow
once Al has a chance to reply...

Honza
--
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR