Re: [PATCH] aio: fix a user triggered use after free (and fix freeze protection of aio writes)

From: Al Viro
Date: Sat Oct 29 2016 - 12:12:43 EST


On Sat, Oct 29, 2016 at 05:20:17PM +0200, Christoph Hellwig wrote:
> On Sat, Oct 29, 2016 at 01:24:51PM +0100, Al Viro wrote:
> > How about taking this chunk (i.e. telling lockdep that we are not holding this
> > thing) past the iter_op() call, where file_end_write() used to be?
>
> We can't as that would not fix the use after free (at least for the lockdep
> case - otherwise the call is a no-op). Once iter_op returns aio_complete
> might have dropped our reference to the file, and another thread might
> have closed the fd so that the fput from aio_complete was the last one.
>
> This is something that xfstests/323 can reproduce under the right conditions.

Eww... IOW, as soon as we'd submitted an async iocb, nobody can so much as
look at struct file *or* iocb, right? Or underlying inode, or any fs-private
data structures attached to it...

I certainly agree that it's a bug, but IMO you are just papering over it.
Just look at e.g.
written = mapping->a_ops->direct_IO(iocb, &data);

/*
* Finally, try again to invalidate clean pages which might have been
* cached by non-direct readahead, or faulted in by get_user_pages()
* if the source of the write was an mmap'ed region of the file
* we're writing. Either one is a pretty crazy thing to do,
* so we don't support it 100%. If this invalidation
* fails, tough, the write still worked...
*/
if (mapping->nrpages) {
invalidate_inode_pages2_range(mapping,
pos >> PAGE_SHIFT, end);
}
in generic_file_direct_write() - who said that mapping doesn't point into
freed memory by that point? Or that
inode_lock(inode);
ret = generic_write_checks(iocb, from);
if (ret > 0)
ret = __generic_file_write_iter(iocb, from);
inode_unlock(inode);
in generic_file_write_iter() won't blow up on inode_unlock() of a freed inode?
Or that anything of that sort won't happen in other ->write_iter instances,
for that matter?

NAK, with apologies for not having looked at that earlier. The bug is real,
all right, but this is not a solution - both incomplete and far too brittle.

Why do we play that kind of insane games, anyway? Why not e.g. refcount
the (async) iocb and have kiocb_free() drop the reference, with io_submit_one()
holding one across the call of aio_run_iocb()? Cacheline bouncing issues?
Anything more subtle?