Re: [PATCH net v2] octeontx2-pf: Fix page pool cache index corruption.

From: Sebastian Andrzej Siewior
Date: Thu Sep 07 2023 - 13:21:11 EST


On 2023-09-07 07:17:11 [+0530], Ratheesh Kannoth wrote:
> The access to page pool `cache' array and the `count' variable
> is not locked. Page pool cache access is fine as long as there
> is only one consumer per pool.
>
> octeontx2 driver fills in rx buffers from page pool in NAPI context.
> If system is stressed and could not allocate buffers, refiiling work
> will be delegated to a delayed workqueue. This means that there are
> two cosumers to the page pool cache.
>
> Either workqueue or IRQ/NAPI can be run on other CPU. This will lead
> to lock less access, hence corruption of cache pool indexes.
>
> To fix this issue, NAPI is rescheduled from workqueue context to refill
> rx buffers.
>
> Fixes: b2e3406a38f0 ("octeontx2-pf: Add support for page pool")
> Signed-off-by: Ratheesh Kannoth <rkannoth@xxxxxxxxxxx>

Reported-by: Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx>

> diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c
> index 8511906cb4e2..997fedac3a98 100644
> --- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c
> +++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c
> static void otx2_pool_refill_task(struct work_struct *work)
> {
> struct otx2_cq_queue *cq;
> - struct otx2_pool *rbpool;
> struct refill_work *wrk;
> - int qidx, free_ptrs = 0;
> struct otx2_nic *pfvf;
> - dma_addr_t bufptr;
> + int qidx;
>
> wrk = container_of(work, struct refill_work, pool_refill_work.work);
> pfvf = wrk->pf;
> qidx = wrk - pfvf->refill_wrk;
> cq = &pfvf->qset.cq[qidx];

> cq->refill_task_sched = false;
> +
> + local_bh_disable();
> + napi_schedule(wrk->napi);
> + local_bh_enable();

This is a nitpick since I haven't look how it works exactly: Is it
possible that the wrk->napi pointer gets overwritten by
otx2_napi_handler() since you cleared cq->refill_task_sched() earlier?

> }
>
> int otx2_config_nix_queues(struct otx2_nic *pfvf)
> diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_txrx.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_txrx.c
> index e369baf11530..b778ed366f81 100644
> --- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_txrx.c
> +++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_txrx.c
> @@ -561,9 +565,24 @@ int otx2_napi_handler(struct napi_struct *napi, int budget)
> otx2_config_irq_coalescing(pfvf, i);
> }
>
> - /* Re-enable interrupts */
> - otx2_write64(pfvf, NIX_LF_CINTX_ENA_W1S(cq_poll->cint_idx),
> - BIT_ULL(0));
> + if (unlikely(!filled_cnt)) {
> + struct refill_work *work;
> + struct delayed_work *dwork;
> +
> + work = &pfvf->refill_wrk[cq->cq_idx];
> + dwork = &work->pool_refill_work;
> + /* Schedule a task if no other task is running */
> + if (!cq->refill_task_sched) {
> + work->napi = napi;
> + cq->refill_task_sched = true;
> + schedule_delayed_work(dwork,
> + msecs_to_jiffies(100));
> + }
> + } else {
> + /* Re-enable interrupts */
> + otx2_write64(pfvf, NIX_LF_CINTX_ENA_W1S(cq_poll->cint_idx),
> + BIT_ULL(0));
> + }
> }
> return workdone;
> }

Sebastian