Re: mm: GPF in bdi_put

From: Jan Kara
Date: Wed Mar 01 2017 - 09:29:30 EST


On Mon 27-02-17 18:27:55, Al Viro wrote:
> On Mon, Feb 27, 2017 at 06:11:11PM +0100, Dmitry Vyukov wrote:
> > Hello,
> >
> > The following program triggers GPF in bdi_put:
> > https://gist.githubusercontent.com/dvyukov/15b3e211f937ff6abc558724369066ce/raw/cc017edf57963e30175a6a6fe2b8d917f6e92899/gistfile1.txt
>
> What happens is
> * attempt of, essentially, mount -t bdev ..., calls mount_pseudo()
> and then promptly destroys the new instance it has created.
> * the only inode created on that sucker (root directory, that
> is) gets evicted.
> * most of ->evict_inode() is harmless, until it gets to
> if (bdev->bd_bdi != &noop_backing_dev_info)
> bdi_put(bdev->bd_bdi);

Thanks for the analysis!

> added there by "block: Make blk_get_backing_dev_info() safe without open bdev".
> Since ->bd_bdi hadn't been initialized for that sucker (the same patch has
> placed initialization into bdget()), we step into shit of varying nastiness,
> depending on phase of moon, etc.

Yup, I've missed that the root inode of bdev superblock does not go through
bdget() (in fact I didn't think what happens with root inode for bdev
superblock at all) and thus bd_bdi is left uninitialized in that case. I'll
send a fix for that in a while.

> Could somebody explain WTF do we have those two lines in bdev_evict_inode(),
> anyway? We set ->bd_bdi to something other than noop_backing_dev_info only
> in __blkdev_get() when ->bd_openers goes from zero to positive, so why is
> the matching bdi_put() not in __blkdev_put()? Jan?

The problem is writeback code (from flusher work or through sync(2) -
generally inode_to_bdi() users) can be looking at bdev inode independently
from it being open. So if they start looking while the bdev is open but the
dereference happens after it is closed and device removed, we oops. We have
seen oopses due to this for quite a while. And all the stuff that is done
in __blkdev_put() is not enough to prevent writeback code from having a
look whether there is not something to write.

So what we do now is that once we establish valid bd_bdi reference, we
leave it alone until bdev inode gets evicted. And to handle the case when
underlying device actually changes, we unhash bdev inode when the device
gets removed from the system so that it cannot be found by bdget() anymore.

Honza
--
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR