Re: [RFC] Heads up on a series of AIO patchsets

From: Zach Brown
Date: Tue Jan 02 2007 - 18:56:31 EST


Sorry for the delay, I'm finally back from the holiday break :)

(1) The filesystem AIO patchset, attempts to address one part of the problem
which is to make regular file IO, (without O_DIRECT) asynchronous (mainly
the case of reads of uncached or partially cached files, and O_SYNC writes).

One of the properties of the currently implemented EIOCBRETRY aio path is that ->mm is the only field in current which matches the submitting task_struct while inside the retry path.

It looks like a retry-based aio write path would be broken because of this. generic_write_checks() could run in the aio thread and get its task_struct instead of that of the submitter. The wrong rlimit will be tested and SIGXFSZ won't be raised. remove_suid() could check the capabilities of the aio thread instead of those of the submitter.

I don't think EIOCBRETRY is the way to go because of this increased (and subtle!) complexity. What are the chances that we would have ever found those bugs outside code review? How do we make sure that current references don't sneak back in after having initially audited the paths?

Take the io_cmd_epoll_wait patch..

issues). The IO_CMD_EPOLL_WAIT patch (originally from Zach Brown with
modifications from Jeff Moyer and me) addresses this problem for native
linux aio in a simple manner.

It's simple looking, sure. This current flipping didn't even occur to me while throwing the patch together!

But that patch ends up calling ->poll (and poll_table->qproc) and writing to userspace (so potentially calling ->nopage) from the aio threads. Are we sure that none of them will behave surprisingly because current changed under them?

It might be safe now, but that isn't really the point. I'd rather we didn't have yet one more subtle invariant to audit and maintain.

At the risk of making myself vulnerable to the charge of mentioning vapourware, I will admit that I've been working on a (slightly mad) implementation of async syscalls. I've been quiet about it because I don't want to whip up complicated discussion without being able to show code that works, even if barely. I mention it now only to make it clear that I want to be constructive, not just critical :).

- z
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/