Re: [net-next PATCH 2/3] octeontx2-af: Add devlink health reporters for NPA

From: Willem de Bruijn
Date: Mon Nov 02 2020 - 08:42:45 EST


On Mon, Nov 2, 2020 at 12:07 AM George Cherian
<george.cherian@xxxxxxxxxxx> wrote:
>
> Add health reporters for RVU NPA block.
> Only reporter dump is supported
>
> Output:
> # devlink health
> pci/0002:01:00.0:
> reporter npa
> state healthy error 0 recover 0
> # devlink health dump show pci/0002:01:00.0 reporter npa
> NPA_AF_GENERAL:
> Unmap PF Error: 0
> Free Disabled for NIX0 RX: 0
> Free Disabled for NIX0 TX: 0
> Free Disabled for NIX1 RX: 0
> Free Disabled for NIX1 TX: 0
> Free Disabled for SSO: 0
> Free Disabled for TIM: 0
> Free Disabled for DPI: 0
> Free Disabled for AURA: 0
> Alloc Disabled for Resvd: 0
> NPA_AF_ERR:
> Memory Fault on NPA_AQ_INST_S read: 0
> Memory Fault on NPA_AQ_RES_S write: 0
> AQ Doorbell Error: 0
> Poisoned data on NPA_AQ_INST_S read: 0
> Poisoned data on NPA_AQ_RES_S write: 0
> Poisoned data on HW context read: 0
> NPA_AF_RVU:
> Unmap Slot Error: 0
>
> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@xxxxxxxxxxx>
> Signed-off-by: Jerin Jacob <jerinj@xxxxxxxxxxx>
> Signed-off-by: George Cherian <george.cherian@xxxxxxxxxxx>


> +static bool rvu_npa_af_request_irq(struct rvu *rvu, int blkaddr, int offset,
> + const char *name, irq_handler_t fn)
> +{
> + struct rvu_devlink *rvu_dl = rvu->rvu_dl;
> + int rc;
> +
> + WARN_ON(rvu->irq_allocated[offset]);

Please use WARN_ON sparingly for important unrecoverable events. This
seems like a basic precondition. If it can happen at all, can probably
catch in a normal branch with a netdev_err. The stacktrace in the oops
is not likely to point at the source of the non-zero value, anyway.

> + rvu->irq_allocated[offset] = false;

Why initialize this here? Are these fields not zeroed on alloc? Is
this here only to safely call rvu_npa_unregister_interrupts on partial
alloc? Then it might be simpler to just have jump labels in this
function to free the successfully requested irqs.

> + sprintf(&rvu->irq_name[offset * NAME_SIZE], name);
> + rc = request_irq(pci_irq_vector(rvu->pdev, offset), fn, 0,
> + &rvu->irq_name[offset * NAME_SIZE], rvu_dl);
> + if (rc)
> + dev_warn(rvu->dev, "Failed to register %s irq\n", name);
> + else
> + rvu->irq_allocated[offset] = true;
> +
> + return rvu->irq_allocated[offset];
> +}

> +static int rvu_npa_health_reporters_create(struct rvu_devlink *rvu_dl)
> +{
> + struct devlink_health_reporter *rvu_npa_health_reporter;
> + struct rvu_npa_event_cnt *npa_event_count;
> + struct rvu *rvu = rvu_dl->rvu;
> +
> + npa_event_count = kzalloc(sizeof(*npa_event_count), GFP_KERNEL);
> + if (!npa_event_count)
> + return -ENOMEM;
> +
> + rvu_dl->npa_event_cnt = npa_event_count;
> + rvu_npa_health_reporter = devlink_health_reporter_create(rvu_dl->dl,
> + &rvu_npa_hw_fault_reporter_ops,
> + 0, rvu);
> + if (IS_ERR(rvu_npa_health_reporter)) {
> + dev_warn(rvu->dev, "Failed to create npa reporter, err =%ld\n",
> + PTR_ERR(rvu_npa_health_reporter));
> + return PTR_ERR(rvu_npa_health_reporter);
> + }
> +
> + rvu_dl->rvu_npa_health_reporter = rvu_npa_health_reporter;
> + return 0;
> +}
> +
> +static void rvu_npa_health_reporters_destroy(struct rvu_devlink *rvu_dl)
> +{
> + if (!rvu_dl->rvu_npa_health_reporter)
> + return;
> +
> + devlink_health_reporter_destroy(rvu_dl->rvu_npa_health_reporter);
> +}
> +
> +static int rvu_health_reporters_create(struct rvu *rvu)
> +{
> + struct rvu_devlink *rvu_dl;
> +
> + if (!rvu->rvu_dl)
> + return -EINVAL;
> +
> + rvu_dl = rvu->rvu_dl;
> + return rvu_npa_health_reporters_create(rvu_dl);

No need for local var rvu_dl. Here and below.

Without that, the entire helper is probably not needed.

> +}
> +
> +static void rvu_health_reporters_destroy(struct rvu *rvu)
> +{
> + struct rvu_devlink *rvu_dl;
> +
> + if (!rvu->rvu_dl)
> + return;
> +
> + rvu_dl = rvu->rvu_dl;
> + rvu_npa_health_reporters_destroy(rvu_dl);
> +}
> +
> static int rvu_devlink_info_get(struct devlink *devlink, struct devlink_info_req *req,
> struct netlink_ext_ack *extack)
> {
> @@ -53,7 +483,8 @@ int rvu_register_dl(struct rvu *rvu)
> rvu_dl->dl = dl;
> rvu_dl->rvu = rvu;
> rvu->rvu_dl = rvu_dl;
> - return 0;
> +
> + return rvu_health_reporters_create(rvu);

when would this be called with rvu->rvu_dl == NULL?