RE: NULL pointer dereference in igen6_probe - 6.16-rc2
From: Zhuo, Qiuxu
Date: Tue Jun 17 2025 - 10:25:09 EST
Hi Boris,
> From: Borislav Petkov <bp@xxxxxxxxx>
> [...]
> > [ 13.565035] EDAC MC0: Giving out device to module igen6_edac controller
> Intel_client_SoC MC#0: DEV 0000:00:00.0 (INTERRUPT)
> > [ 13.565746] EDAC igen6: Expected 2 mcs, but only 1 detected.
>
> Well, folks, if you've detected only one memory controller, then work with
> only one and do not kill the machine:
>
Yes.
> diff --git a/drivers/edac/igen6_edac.c b/drivers/edac/igen6_edac.c index
> 1930dc00c791..23e26ba2d49b 100644
> --- a/drivers/edac/igen6_edac.c
> +++ b/drivers/edac/igen6_edac.c
> @@ -1350,9 +1350,11 @@ static int igen6_register_mcis(struct pci_dev *pdev,
> u64 mchbar)
> return -ENODEV;
> }
>
> - if (lmc < res_cfg->num_imc)
> + if (lmc < res_cfg->num_imc) {
> igen6_printk(KERN_WARNING, "Expected %d mcs, but
> only %d detected.",
> res_cfg->num_imc, lmc);
> + res_cfg->num_imc = lmc;
> + }
>
> return 0;
>
> ---
>
> but then that cfg struct is const :-\
>
> drivers/edac/igen6_edac.c: In function ‘igen6_register_mcis’:
> drivers/edac/igen6_edac.c:1356:34: error: assignment of member ‘num_imc’
> in read-only object
> 1356 | res_cfg->num_imc = lmc;
> | ^
>
>
> Unless it is some gunky crap this coreboot does - then we will have to have a
> longer talk.
>
> 😝
In the 10nm_edac driver for Intel Xeon server, 'cfg' is non-const, and the field
'cfg->ddr_imc_num' [1] is overwritten with the number of detected DDR memory
controllers at runtime.
Reverting 'cfg' in this igen6_edac driver to non-const, allowing it to be set
with the actual number of detected memory controllers seems reasonable.
After that then applying Boris' fix above is the simplest way to resolve the
issue. 😊
[1] https://github.com/torvalds/linux/blob/master/drivers/edac/i10nm_base.c#L479
Thanks.
-Qiuxu