Re: [PATCH 2/5] hv: add helpers to handle hv_util device state

From: Vitaly Kuznetsov
Date: Mon Sep 21 2015 - 09:38:20 EST


Olaf Hering <olaf@xxxxxxxxx> writes:

> On Mon, Sep 21, Vitaly Kuznetsov wrote:
>
>> I'd like to see a trace from the hang, it is not obvious to me how it
>> happened and what caused it. (or if you have such hang scenario in your
>> head, can you please reveal it?)
>
> There is no trace. I think fcopy_respond_to_host notifies the host,
> which in turn triggers an interrupt right away which is processed while
> fcopy_on_msg is executing somewhere between the return from
> fcopy_respond_to_host and the call into hv_fcopy_onchannelcallback.
>

I think it is fcopy_transaction.fcopy_context which gets out of sync.

When we're done processing some request we have the following code:

fcopy_transaction.state = HVUTIL_USERSPACE_RECV;
fcopy_respond_to_host(*val);
fcopy_transaction.state = HVUTIL_READY;
hv_poll_channel(fcopy_transaction.fcopy_context,
hv_fcopy_onchannelcallback);

If interrupt happens after we did fcopy_respond_to_host()
fcopy_transaction.state will still be HVUTIL_USERSPACE_RECV or even its
previous HVUTIL_USERSPACE_REQ but it's OK as we have the following in
hv_fcopy_onchannelcallback()

if (fcopy_transaction.state > HVUTIL_READY) {
/*
* We will defer processing this callback once
* the current transaction is complete.
*/
fcopy_transaction.fcopy_context = context;
return;
}

And we're supposed to process the work with hv_poll_channel(). The
problem is (I guess) that fcopy_transaction.fcopy_context gets out of
sync and it still has its previous value (possibly NULL). We call
hv_poll_channel() with NULL and everything gets stuck as we'll never
process the request.

AFAICS proper locking is requred here (and probably in all three
drivers), we need to protect not only .state but the whole transaction.

[...]

--
Vitaly
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/