Re: [V9fs-developer] INFO: task hung in grab_super

From: Dominique Martinet
Date: Thu Aug 02 2018 - 18:18:52 EST


Dmitry Vyukov via V9fs-developer wrote on Wed, Jul 18, 2018:
> >> Btw, I see that p9_client_rpc uses wait_event_killable, why wasn't it
> >> killed along with the whole process?
> >>
> >
> > wait_event_killable() would return -ERESTARTSYS if got SIGKILL.
> > But if (c->status == Connected) && (type == P9_TFLUSH) is also true,
> > it ignores SIGKILL by retrying the loop...
> >
> > again:
> > err = wait_event_killable(*req->wq, req->status >= REQ_STATUS_RCVD);
> > if ((err == -ERESTARTSYS) && (c->status == Connected) && (type == P9_TFLUSH)) {
> > sigpending = 1;
> > clear_thread_flag(TIF_SIGPENDING);
> > goto again;
> > }
> >
> > I wish they don't ignore SIGKILL (by e.g. offloading operations to a kernel thread).
>
>
> I guess that's the problem, right? SIGKILL-ed task must not ignore
> SIGKILL and hang in infinite loop. This would explain a bunch of hangs
> in 9p.

Tricky with the current way we handle this, as the normal action if
wait_event_killable is interrupted is to send a tflush message (which is
what you could also notice, if you just send one sigkill it'll just send
a flush message and wait for that instead)

There's work in progress to add refcounting to requests which would make
us one step closer to being able to not wait for the flush reply (or
rather, it'll give us the ability to wait for it asynchronously) ; we
should be able to get rid of that loop after that.

--
Dominique