Re: [PATCH] ghes: Track number of recovered hardware errors

From: Breno Leitao
Date: Wed Jul 16 2025 - 08:43:12 EST

Next message: Andrea Righi: "Re: [PATCH 1/2] sched_ext: Track currently locked rq"
Previous message: Darren Ye (叶飞): "Re: [PATCH v6 08/10] ASoC: dt-bindings: mediatek,mt8196-afe: add audio AFE"
In reply to: Mauro Carvalho Chehab: "Re: [PATCH] ghes: Track number of recovered hardware errors"
Next in thread: Shuai Xue: "Re: [PATCH] ghes: Track number of recovered hardware errors"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

hello Shuai,

On Wed, Jul 16, 2025 at 11:04:28AM +0800, Shuai Xue wrote:
> > My plan with this patch is to have a counter for hardware errors that
> > would be exposed to the crashdump. So, post-morten analyzes tooling can
> > easily query if there are hardware errors and query RAS information in
> > the right databases, in case it seems a smoking gun.
>
> I see your point. But does using a single ghes_recovered_errors counter
> to track all corrected and non-fatal errors for CPU, memory, and PCIe
> really help?

It provides a quick indication that hardware issues have occurred, which
can prompt the operator to investigate further via RAS events.

That said, Tony proposed a more robust approach—categorizing and
tracking errors by their source. This would involve maintaining separate
counters for each source using an counter per enum type:

enum recovered_error_sources {
ERR_GHES,
ERR_MCE,
ERR_AER,
...
ERR_NUM_SOURCES
};

See more at: https://lore.kernel.org/all/aHWC-J851eaHa_Au@agluck-desk3/

Do you think this would help you by any chance?

Thanks
--breno

Next message: Andrea Righi: "Re: [PATCH 1/2] sched_ext: Track currently locked rq"
Previous message: Darren Ye (叶飞): "Re: [PATCH v6 08/10] ASoC: dt-bindings: mediatek,mt8196-afe: add audio AFE"
In reply to: Mauro Carvalho Chehab: "Re: [PATCH] ghes: Track number of recovered hardware errors"
Next in thread: Shuai Xue: "Re: [PATCH] ghes: Track number of recovered hardware errors"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]