Re: [PATCH] aio: fix a user triggered use after free (and fix freeze protection of aio writes)

From: Al Viro
Date: Sat Oct 29 2016 - 15:18:30 EST


On Sat, Oct 29, 2016 at 12:07:07PM -0700, Linus Torvalds wrote:
> On Sat, Oct 29, 2016 at 11:52 AM, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
> >
> > And that call can happen as soon as we return from __blockdev_direct_IO()
> > (even earlier, actually). As soon as that happens, the reference to
> > struct file we'd acquired in io_submit_one() is dropped. If descriptor
> > table had been shared, another thread might have already closed that sucker,
> > and fput() from aio_complete() would free struct file.
>
> But that's the point. We don't *do* anything like that any more. We
> now always do the final access from aio_complete(). So it doesn't
> matter if that is called asynchronously (very early) or not.
>
> That's the whole point of the patch. Exactly to do everything either
> *before* we even submit it (at which point no completion can happen),
> or doing it in aio_complete() which is guaranteed to be after the
> submission. No races, no use-after-free.
>
> What am I missing?

if (ret > 0)
ret = __generic_file_write_iter(iocb, from);
inode_unlock(inode);
in generic_file_write_iter() (inode might be gone here)
written = mapping->a_ops->direct_IO(iocb, &data);

/*
* Finally, try again to invalidate clean pages which might have been
* cached by non-direct readahead, or faulted in by get_user_pages()
* if the source of the write was an mmap'ed region of the file
* we're writing. Either one is a pretty crazy thing to do,
* so we don't support it 100%. If this invalidation
* fails, tough, the write still worked...
*/
if (mapping->nrpages) {
invalidate_inode_pages2_range(mapping,
pos >> PAGE_SHIFT, end);
in generic_file_direct_write() (mapping points into inode, which might be
freed)
ret = __blockdev_direct_IO(iocb, inode, target->bt_bdev, &data,
xfs_get_blocks_direct, NULL, NULL, 0);
if (ret >= 0) {
iocb->ki_pos += ret;
iov_iter_advance(to, ret);
}
xfs_rw_iunlock(ip, XFS_IOLOCK_SHARED);
in xfs_file_dio_aio_read() (ip points to xfs-private part of inode).

And that - just from a quick look. We *do* access inode between the call
of __blockdev_direct_IO() and return from ->write_iter(). What's more,
as soon as aio_complete() has happened, what's to stop close from another
thread + umount + rmmod unmapping that ->write_iter() completely?

AFAICS, the possibility of dropping the last reference to struct file
before ->write_iter() has returned is fundamentally broken. I might be
missing something subtle here, but...