Re: [REGRESSION 5.19] NULL dereference by ucsi_acpi driver

From: Takashi Iwai
Date: Tue Aug 23 2022 - 02:52:58 EST


On Tue, 23 Aug 2022 08:41:00 +0200,
Greg Kroah-Hartman wrote:
>
> On Tue, Aug 23, 2022 at 10:26:59AM +0800, Linyu Yuan wrote:
> >
> > On 8/22/2022 9:24 PM, Heikki Krogerus wrote:
> > > Hi,
> > >
> > > On Sat, Aug 20, 2022 at 08:40:52PM +0200, Greg Kroah-Hartman wrote:
> > > > On Fri, Aug 19, 2022 at 06:32:43PM +0200, Takashi Iwai wrote:
> > > > > Hi,
> > > > >
> > > > > we've got multiple reports about 5.19 kernel starting crashing after
> > > > > some time, and this turned out to be triggered by ucsi_acpi driver.
> > > > > The details are found in:
> > > > > https://bugzilla.suse.com/show_bug.cgi?id=1202386
> > > > >
> > > > > The culprit seems to be the commit 87d0e2f41b8c
> > > > > usb: typec: ucsi: add a common function ucsi_unregister_connectors()
> > > > Adding Heikki to the thread...
> > > >
> > > > > This commit looks as if it were a harmless cleanup, but this failed in
> > > > > a subtle way. Namely, in the error scenario, the driver gets an error
> > > > > at ucsi_register_altmodes(), and goes to the error handling to release
> > > > > the resources. Through this refactoring, the release part was unified
> > > > > to a funciton ucsi_unregister_connectors(). And there, it has a NULL
> > > > > check of con->wq, and it bails out the loop if it's NULL.
> > > > > Meanwhile, ucsi_register_port() itself still calls destroy_workqueue()
> > > > > and clear con->wq at its error path. This ended up in the leftover
> > > > > power supply device with the uninitialized / cleared device.
> > > > >
> > > > > It was confirmed that the problem could be avoided by a simple
> > > > > revert.
> > > > I'll be glad to revert this now, unless Heikki thinks:
> > > >
> > > > > I guess another fix could be removing the part clearing con->wq, i.e.
> > > > >
> > > > > --- a/drivers/usb/typec/ucsi/ucsi.c
> > > > > +++ b/drivers/usb/typec/ucsi/ucsi.c
> > > > > @@ -1192,11 +1192,6 @@ static int ucsi_register_port(struct ucsi *ucsi, int index)
> > > > > out_unlock:
> > > > > mutex_unlock(&con->lock);
> > > > > - if (ret && con->wq) {
> > > > > - destroy_workqueue(con->wq);
> > > > > - con->wq = NULL;
> > > > > - }
> > > > > -
> > > > > return ret;
> > > > > }
> > > > >
> > > > > ... but it's totally untested and I'm not entirely sure whether it's
> > > > > better.
> > > > that is any better?
> > > No, I don't think that's better. Right now I would prefer that we play
> > > it safe and revert.
> > >
> > > The conditions are different in the two places where the ports are
> > > unregistered in this driver. Therefore I don't think it makes sense
> > > to use a function like ucsi_unregister_connectors() that tries to
> > > cover both cases. It will always be a little bit fragile.
> > >
> > > Instead we could introduce a function that can be used to remove a
> > > single port. That would leave the handling of the conditions to the
> > > callers of the function, but it would still remove the boilerplate.
> > > That would be much safer IMO.
> > >
> > > But to fix this problem, I think we should revert.
> >
> > but revert will happen on several stable branch, right ?
>
> If someone sends it to me, yes :)
>
> {hint}

OK, will submit :)


Takashi