Re: [PATCH] acpi: apei: clear error status before acknowledging the error

From: Luck, Tony
Date: Mon Jul 31 2017 - 13:00:27 EST


On Mon, Jul 31, 2017 at 10:15:27AM -0600, Baicar, Tyler wrote:
> I think the better thing to do in this case is still send the ack. If
> ghes_read_estatus() fails, then
> either we are unable to read the estatus or the estatus is empty/invalid.

Right now we silently handle that failure of ghes_read_estatus(). That
might be hiding some Linux bugs if we are calling ghes_proc() in cases
where we shouldn't.

Perhaps we should have something like this, so if systems do start acting
weirdly there will be a note that we took this path:

rc = ghes_read_estatus(ghes, 0);
if (rc) {
pr_notice("surprise failure reading ghes estatus\n");
goto out;
}


> If we do not send the ack, then we will be in a scenario where FW will not
> send any more errors.

We might ACK something that the firmware didn't send, which may
lead to other problems.

> I think it would be better to still have the FW send the errors and kernel
> complain about issues with

But I agree with this. We should send the ACK. Luckliy this doesn't have
a long legacy problem because the whole ACK mechanism is a new thing. So
we only have to worry about GHESv2 supporting BIOS.

-Tony