Re: [regression] nct6775 does not load in 5.4 and 5.5, bisected to b84398d6d7f90080

From: Martin Volf
Date: Sun Feb 23 2020 - 02:07:00 EST


Hello,

On Sat, Feb 22, 2020 at 10:26 PM Guenter Roeck <linux@xxxxxxxxxxxx> wrote:
> On 2/22/20 12:49 PM, Martin Volf wrote:
> > On Sat, Feb 22, 2020 at 8:05 PM Guenter Roeck <linux@xxxxxxxxxxxx> wrote:
> >> On 2/22/20 9:55 AM, Martin Volf wrote:
> >>> On Sat, Feb 22, 2020 at 4:41 PM Guenter Roeck <linux@xxxxxxxxxxxx> wrote:
> >>>> On 2/22/20 3:13 AM, Martin Volf wrote:
> >>>>> hardware monitoring sensors NCT6796D on my Asus PRIME Z390M-PLUS
> >>>>> motherboard with Intel i7-9700 CPU don't work with 5.4 and newer linux
> >>>>> kernels, the driver nct6775 does not load.
> >>>>>
> >>>>> It is working OK in version 5.3. I have used almost all released stable
> >>>>> versions from 5.3.8 to 5.3.16; I didn't try older kernels.
> >>> ...
> >>>> My wild guess would be that the i801 driver is a bit aggressive with
> >>>> reserving memory spaces, but I don't immediately see what it does
> >>>> differently in that regard after the offending patch. Does it work
> >>>> if you unload the i2c_i801 driver first ?
> >>>
> >>> Yes, after unloading i2c_i801, the nct6775 works.
> > ...
> >>> This is diff of /proc/ioports in 5.3.18 with loaded nct6775 and in
> >>> 5.4.21 without:
> >>>
> >>> --- ioports-5.3.18
> >>> +++ ioports-5.4.21
> >>> @@ -2,6 +2,7 @@
> >>> 0000-001f : dma1
> >>> 0020-0021 : pic1
> >>> 002e-0031 : iTCO_wdt
> >>> + 002e-0031 : iTCO_wdt
> >>> 0040-0043 : timer0
> >>> 0050-0053 : timer1
> > ...
> >>> So 0x2e is the resource the two drivers are fighting for.
> > ...
> >> Yes, and it should not do that, since the range can be used to access
> >> different segments of the same chip from multiple drivers. This region
> >> should only be reserved temporarily, using request_muxed_region() when
> >> needed and release_region() after the access is complete. Either case,
> >> I don't immediately see why that region would be interesting for the
> >> iTCO watchdog driver.
> >>
> >> Can you add some debugging into the i801 driver to see what memory regions
> >> it reserves, and how it gets to reserve 0x2e..0x31 ? That range really
> >> doesn't make any sense to me.
> >
> > in the function i801_add_tco() in drivers/i2c/busses/i2c-i801.c
> > (line 1601 in 5.4.21), there is this code:
> >
> > /*
> > * Power Management registers.
> > */
> > devfn = PCI_DEVFN(PCI_SLOT(pci_dev->devfn), 2);
> > pci_bus_read_config_dword(pci_dev->bus, devfn, ACPIBASE, &base_addr);
> >
> > res = &tco_res[ICH_RES_IO_SMI];
> > res->start = (base_addr & ~1) + ACPIBASE_SMI_OFF;
> > res->end = res->start + 3;
> > res->flags = IORESOURCE_IO;
> >
> > base_addr is 0xffffffff after pci_bus_read_config_dword() call.
> > ACPIBASE_SMI_OFF is 0x030, therefore res->start is 0x2e.
> > Not that I understand even a bit of this...
> >
>
> Outch. This means that the code is broken. ACPIBASE is not configured,
> or disabled, or the code reads from the wrong PCI configuration register.
> What I don't understand is why this works with v5.3 kernels; the code
> looks just as bad there for me. I must be missing something. Either case,
> the only thing you can really do at this point is to blacklist the
> iTCO_wdt driver.
>
> Other than that, we can only hope that someone who understands above
> code can provide a fix. Maybe Wolfram has an idea.

I have disabled the watchdog subsystem in kernel config (v5.5.5)
and the modprobe.d workaround and sensors are working.

Thanks a lot for your support!

Martin