Re: Improving AIO cancellation

From: Kent Overstreet
Date: Fri Feb 08 2013 - 16:11:56 EST

On Fri, Feb 08, 2013 at 09:38:11AM -0800, Zach Brown wrote:
> > The draft implementation will look like this. struct bio should have
> > some way to get current status of kiocb that generated bio. So we add
> > a pointer to bool flag.
> >
> > struct bio {
> > bool *cancelled;
> > }
> >
> > in async DIO codepath this pointer will be initialized with bool at
> > "struct kiocb"
> > bio->cancelled = &kiocb->cancelled;
> Hmm. I suppose. It sure looks disgusting, but the iocb has forgotten
> all of the dio state and bios that lead up to the operation once
> submission has completed. It's easy enough to match the lifetime rules
> of this reference with end_io's using the iocb.

That's part of it (w.r.t. the exact location of the cancelled flag), but
the idea also isn't for this to be specific to kiocbs or anything like

The problem, as you alluded to, is once a bio passes
generic_make_request() the upper layer has no idea what the block layer
is doing with it. If it's a stacking block device - or even if it's just
getting bounced - the first thing that happens to the bio is it gets

So, adding a mechanism for the upper layer to chase down those bios and
whatever other state the lower layers created would be gross. Fuck that.
And, you know my thoughts on callbacks; Ted pointed out some reasons we
are probably going to need cancellation callbacks eventually but I
personally _don't_ want this to get designed around callbacks.

But if we just add this layer of indirection, when a bio is cloned we
can copy the pointer too and cancellation magically Just Works.

For md writes and probably some other things we won't be able to just
blindly copy the pointer as that'd fuck up md's parity stuff, but md
could still check the flag itself if it was worth the trouble.

> This method also lets one cancelled iocb flag many submitted bios as
> cancelled, so that's nice.
> > So to cancel kiocb we do
> > kiocb->cancelled = true;
> > and all bio created from the request will not be send to device anymore.
> (With lots of comments to let us know that it being racey is fine.)

Yeah. We should be really explicit about what this flag means; it
_doesn't_ mean "this bio has been cancelled", it means "please try to
cancel this bio".

We don't know if the bio was succesfully cancelled until the bio is
completed (and this doesn't change anything about how bio completion
works) - if it was cancelled, the bi_end_io callback will get
-ECANCELLED or something.

This is very different from the existing AIO cancellation; KIF_CANCEL
means "this kiocb _has been cancelled_". (sort of, I was just rereading
that code and I'm convinced it's buggy).

> > If the proposal is fine then I start implementing it.
> For a teeny hack like this the best proposal is a working prototype
> patch. Don't wait for acks just dive in.

Yeah, and I keep claiming I'm going to implement the AIO bits of this
(I keep getting distracted by other things like taking a flamethrower to
the dio code).

Note that the semantics of this doesn't really match up with
io_cancel(). That's not the end of the world, we can paper over that in
the aio code without too much grossness (or at least it'll be localized

IMO though, io_cancel() is dumb and we want something new. It's
synchronous - and why anyone ever thought cancellation for
_asynchronous_ io should be synchronous I don't know.

Anyways, if io_cancel() is going to succeed it has to block until the io
has been cancelled and it's got a completion - the completion isn't
returned via io_getevents().

This makes _no sense_, and means an application that is making use of
this probably is going to need a thread pool just to do cancellation.

What we want is a new io_cancel_sane() syscall that doesn't return
anything, and only marks the iocb as "cancellation pending". The
completion would still get returned via io_getevents(), and userspace
would find out if it was cancelled or not then.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at