Re: [PATCH] acpi: apei: clear error status before acknowledging the error

From: Baicar, Tyler
Date: Mon Jul 31 2017 - 13:45:08 EST


On 7/31/2017 11:00 AM, Luck, Tony wrote:
On Mon, Jul 31, 2017 at 10:15:27AM -0600, Baicar, Tyler wrote:
I think the better thing to do in this case is still send the ack. If
ghes_read_estatus() fails, then
either we are unable to read the estatus or the estatus is empty/invalid.
Right now we silently handle that failure of ghes_read_estatus(). That
might be hiding some Linux bugs if we are calling ghes_proc() in cases
where we shouldn't.

Perhaps we should have something like this, so if systems do start acting
weirdly there will be a note that we took this path:

rc = ghes_read_estatus(ghes, 0);
if (rc) {
pr_notice("surprise failure reading ghes estatus\n");
goto out;
}
Thank you Tony for the feedback, I can add a print like this in the next version. I'll verify that
rc is not -ENOENT though so we don't print it on empty scenarios since the polled source
will be hitting this path frequently.

-Tyler

If we do not send the ack, then we will be in a scenario where FW will not
send any more errors.
We might ACK something that the firmware didn't send, which may
lead to other problems.

I think it would be better to still have the FW send the errors and kernel
complain about issues with
But I agree with this. We should send the ACK. Luckliy this doesn't have
a long legacy problem because the whole ACK mechanism is a new thing. So
we only have to worry about GHESv2 supporting BIOS.

-Tony

--
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.