Re: Part of devices not initialized with mlx4

From: Leon Romanovsky
Date: Tue Jan 03 2023 - 04:35:19 EST


On Mon, Jan 02, 2023 at 11:33:15AM +0100, Petr Pavlu wrote:
> On 12/18/22 10:53, Leon Romanovsky wrote:
> > On Thu, Dec 15, 2022 at 10:51:15AM +0100, Petr Pavlu wrote:
> >> Hello,
> >>
> >> We have seen an issue when some of ConnectX-3 devices are not initialized
> >> when mlx4 drivers are a part of initrd.
> >
> > <...>
> >
> >> * Systemd stops running services and then sends SIGTERM to "unmanaged" tasks
> >> on the system to terminate them too. This includes the modprobe task.
> >> * Initialization of mlx4_en is interrupted in the middle of its init function.
> >
> > And why do you think that this systemd behaviour is correct one?
>
> My view is that this is an issue between the kernel and initrd/systemd.
> Switching the root is a delicate operation and both parts need to carefully
> cooperate for it to work correctly.
>
> I think it is generally sensible that systemd tries to terminate any remaining
> processes started from the initrd. They would have troubles when the root is
> switched under their hands anyway, unless they are specifically prepared for
> it. Systemd only skips terminating kthreads and allows to exclude root storage
> daemons. A modprobe helper could be excluded from being terminated too but the
> problem with the root switch remains.
>
> It looks to me that a good approach is to complete all running module loads
> before switching the root and continue with any further loads after the
> operation is done. Leaving module loads to udevd assures this, hence the idea
> to use an auxiliary bus.

I'm not sure about it. Everything above are user-space troubles which
are invited once systemd does root switch. Anyway, if you want to do
aux bus for mlx4, go for it.

Feel free to send me patches off-list and I will add them to our
regression, but be aware that you are stepping on landmine field
here.

Thanks