Re: [GIT PULL] VFIO fixes for v4.1-rc2

From: Alex Williamson
Date: Fri May 01 2015 - 18:03:50 EST


On Fri, 2015-05-01 at 13:23 -0700, Linus Torvalds wrote:
> On Fri, May 1, 2015 at 11:48 AM, Alex Williamson
> <alex.williamson@xxxxxxxxxx> wrote:
> >
> > Ok. It seemed like useful behavior to be able to provide some response
> > to the user in the event that a ->remove handler is blocked by a device
> > in-use and the user attempts to abort the action.
>
> Well, that kind of notification *might* be useful, but at the cost of
> saying "somebody tried to send you a signal, so I am now telling you
> about it, and then deleting that signal, and you'll never know what it
> actually was"?
>
> That's not useful, that's just wrong.

Yep, it was a bad idea.

> Now, what might in theory be useful - but I haven't actually seen
> anybody do anything like that - is to start out with an interruptible
> sleep, warn if you get interrupted, and then continue with an
> un-interruptible sleep (leaving the signal active).

I was considering doing exactly this.

> But even that sounds like a very special case, and I don't think
> anything has ever done that.
>
> In general, our signal handling falls into three distinct categories:
>
> (a) interruptible (and you can cancel the operation and return "try again")
>
> (b) killable (you can cancel the operation, knowing that the
> requester will be killed and won't try again)
>
> (c) uninterruptible
>
> where that (b) tends to be a special case of an operation that
> technically isn't really interruptible (because the ABI doesn't allow
> for retrying or error returns), but knowing that the caller will never
> see the error case because it's killed means that you can do it. The
> classic example of that is an NFS mount that is mounted "nointr" - you
> can't return EINTR for a read or a write (because that invalidates
> POSIX) but you want to let SIGKILL just kill the process in the middle
> when the network is hung.

I think we're in that (c) case unless we want to change our driver API
to allow driver remove callbacks to return error. Killing the caller
doesn't really help the situation without being able to back out of the
remove path. Killing the task with the device open would help, but
seems rather harsh. I expect we eventually want to be able to escalate
to revoking the device from the user, but currently we only have a
notifier to request the device from cooperative users. In the event of
an uncooperative user, we block, which can be difficult to figure out,
especially when we're dealing with SR-IOV devices and a PF unbind
implicitly induces a VF unbind. The interruptible component here is
simply a logging mechanism which should have turned into an
"interruptible_once" rather than a signal flush.

I try to avoid vfio being a special case, but maybe in this instance
it's worthwhile. If you have other suggestions, please let me know.
Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/