Re: linuxnext-2019119 edac warns (was Re: edac KASAN warning in experimental arm64 allmodconfig boot)
From: Robert Richter
Date:  Fri Nov 22 2019 - 06:29:14 EST
On 21.11.19 15:23:42, John Garry wrote:
> On 21/11/2019 14:23, Robert Richter wrote:
> > On 21.11.19 12:34:22, John Garry wrote:
> > > [   22.046666] EDAC MC: bug in low-level driver: attempt to assign
> > > [   22.046666]     duplicate mc_idx 0 in add_mc_to_global_list()
> > > [   22.058311] ghes_edac: Can't register at EDAC core
> > > [   22.065402] EDAC MC: bug in low-level driver: attempt to assign
> > > [   22.065402]     duplicate mc_idx 0 in add_mc_to_global_list()
> > > [   22.077080] ghes_edac: Can't register at EDAC core
> > > [   22.084140] EDAC MC: bug in low-level driver: attempt to assign
> > > [   22.084140]     duplicate mc_idx 0 in add_mc_to_global_list()
> > > [   22.095789] ghes_edac: Can't register at EDAC core
> > > [   22.102873] EDAC MC: bug in low-level driver: attempt to assign
> > > [   22.102873]     duplicate mc_idx 0 in add_mc_to_global_list()
> > > [   22.115442] ghes_edac: Can't register at EDAC core
> > > [   22.122536] EDAC MC: bug in low-level driver: attempt to assign
> > > [   22.122536]     duplicate mc_idx 0 in add_mc_to_global_list()
> > > [   22.134344] ghes_edac: Can't register at EDAC core
> > > [   22.141441] EDAC MC: bug in low-level driver: attempt to assign
> > > [   22.141441]     duplicate mc_idx 0 in add_mc_to_global_list()
> > > [   22.153089] ghes_edac: Can't register at EDAC core
> > > [   22.160161] EDAC MC: bug in low-level driver: attempt to assign
> > > [   22.160161]     duplicate mc_idx 0 in add_mc_to_global_list()
> > > [   22.171810] ghes_edac: Can't register at EDAC core
> > 
> > What I am more concerned is this here. In total this implies 8 ghes
> > users that all try to register a (single-instance) ghes mc device. For
> > non-x86 only one instance is allowed (see ghes_edac_register(), idx =
> > 0).
I also looked into this: With refcount_inc_checked() enabled, the
refcount is *not* increased from 0 to 1. Under the hood only
refcount_inc_not_zero() is called instead of refcount_inc(). So the
refcount is still zero after an edac mc device was registered. Instead
of sharing the edac mc device, the driver tries to allocate another mc
device for each GHESv2 entry in the HEST table. This causes the
'duplicate mc_idx' message. Also, it is ok to have multiple GHESv2
entries (your system seems to have 8 entries), e.g. to serve different
kind of errors in the system.
Thanks,
-Robert