Re: [PATCH v3 net 4/4] pds_core: make wait_context part of q_info

From: Jakub Kicinski
Date: Thu Apr 17 2025 - 11:22:19 EST


On Tue, 15 Apr 2025 16:29:31 -0700 Shannon Nelson wrote:
> Make the wait_context a full part of the q_info struct rather
> than a stack variable that goes away after pdsc_adminq_post()
> is done so that the context is still available after the wait
> loop has given up.
>
> There was a case where a slow development firmware caused
> the adminq request to time out, but then later the FW finally
> finished the request and sent the interrupt. The handler tried
> to complete_all() the completion context that had been created
> on the stack in pdsc_adminq_post() but no longer existed.
> This caused bad pointer usage, kernel crashes, and much wailing
> and gnashing of teeth.

The patch will certainly redirect the access from the stack.
But since you're already processing the completions under a spin
lock, is it not possible to safely invalidate the completion
under the same lock on timeout?

Perhaps not I haven't looked very closely.

> + wc = &pdsc->adminqcq.q.info[index].wc;
> + wc->wait_completion = COMPLETION_INITIALIZER_ONSTACK(wc->wait_completion);

_ONSTACK you say? I don't think it's on the stack any more.