Re: [PATCH v2 02/24] EDAC, ghes: Fix grain calculation

From: Robert Richter
Date: Mon Aug 12 2019 - 02:42:31 EST


On 09.08.19 15:15:59, Borislav Petkov wrote:
> On Mon, Jun 24, 2019 at 03:08:57PM +0000, Robert Richter wrote:
> > The conversion from the physical address mask to a grain (defined as
> > granularity in bytes) is broken:
> >
> > e->grain = ~(mem_err->physical_addr_mask & ~PAGE_MASK);
> >
> > E.g., a physical address mask of ~0xfff should give a grain of 0x1000,
> > instead the grain is wrong with the upper bits always set. We also
> > remove the limitation to the page size as the granularity is unrelated
> > to the page size used in the system. We fix this with:
> >
> > e->grain = ~mem_err->physical_addr_mask + 1;
> >
> > Note: We need to adopt the grain_bits calculation as e->grain is now a
> > power of 2 and no longer a bit mask. The formula is now the same as in
> > edac_mc and can later be unified.
>
> Please refrain from using "We" or "I" or etc personal pronouns in a
> commit message and in the code comments below.
>
> >From Documentation/process/submitting-patches.rst:
>
> "Describe your changes in imperative mood, e.g. "make xyzzy do frotz"
> instead of "[This patch] makes xyzzy do frotz" or "[I] changed xyzzy
> to do frotz", as if you are giving orders to the codebase to change
> its behaviour."
>
> Please fix all your other commit messages for the next submission.

Sure, will reword.

I have seen you had actively promoted this style guideline, I even was
not aware of it, thanks for the pointer.

>
> > Signed-off-by: Robert Richter <rrichter@xxxxxxxxxxx>
> > ---
> > drivers/edac/ghes_edac.c | 12 ++++++++++--
> > 1 file changed, 10 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
> > index 7f19f1c672c3..d095d98d6a8d 100644
> > --- a/drivers/edac/ghes_edac.c
> > +++ b/drivers/edac/ghes_edac.c
> > @@ -222,6 +222,7 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
> > /* Cleans the error report buffer */
> > memset(e, 0, sizeof (*e));
> > e->error_count = 1;
> > + e->grain = 1;
> > strcpy(e->label, "unknown label");
> > e->msg = pvt->msg;
> > e->other_detail = pvt->other_detail;
> > @@ -317,7 +318,7 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
> >
> > /* Error grain */
> > if (mem_err->validation_bits & CPER_MEM_VALID_PA_MASK)
> > - e->grain = ~(mem_err->physical_addr_mask & ~PAGE_MASK);
> > + e->grain = ~mem_err->physical_addr_mask + 1;
>
> This is assuming that that ->physical_addr_mask is contiguous but I
> don't trust any firmware. I guess we can leave it like that for now
> until some "inventive" firmware actually does it.

With the grain_bits calculation the mask is rounded up to the next
power of 2 value. I therefore don't see any issues for non-contiguous
bit masks. I have updated the patch description.

>
> >
> > /* Memory error location, mapped on e->location */
> > p = e->location;
> > @@ -433,8 +434,15 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
> > if (p > pvt->other_detail)
> > *(p - 1) = '\0';
> >
> > + /*
> > + * We expect the hw to report a reasonable grain, fallback to
> > + * 1 byte granularity otherwise.
> > + */
> > + if (WARN_ON_ONCE(!e->grain))
>
> Please move that WARN_ON_ONCE in the
>
> if (mem_err->validation_bits & CPER_MEM_VALID_PA_MASK)
>
> branch above because you're presetting grain to 1 so the warn should be
> close to where it could happen, i.e., when coming from the firmware.

The reason this is here is because this check will be moved to
edac_raw_mc_handle_error() to unify edac_mc and ghes code (see patch
#4). I understand the warn should be close to its source, on the other
side we need the check for all the drivers that setup the grain. Thus,
it cannot be in the driver that is setting up the grain.

Thanks,

-Robert