Re: [PATCH] staging: pi433: fix race condition in pi433_open

From: Dan Carpenter
Date: Tue Jun 19 2018 - 06:43:18 EST


On Mon, Jun 18, 2018 at 11:11:36PM -0400, Hugo Lefeuvre wrote:
> Hi Dan,
>
> > We need to decrement device->users-- on the error paths as well.
> > This function was already slightly broken with respect to counting the
> > users, but let's not make it worse.
> >
> > I think it's still a tiny bit racy because it's not an atomic type.
>
> Oh right, I missed that. I'll fix it in the v2. :)
>
> > I'm not sure the error handling in open() works either. It's releasing
> > device->rx_buffer but there could be another user.
>
> Agree.
>
> > The ->rx_buffer
> > should be allocated in probe() instead of open() probably, no? And then
> > freed in pi433_remove(). Then once we put that in the right layer
> > it means we can just get rid of ->users...
>
> It would be great to get rid of this counter, indeed. But how to do it
> properly without breaking things ? It seems to be useful to me...

These things are refcounted so you can't unload the module while a file
is open. When we do an open it does a cdev_get(). When we call the
delete_module syscall, it checks the refcount in try_stop_module() ->
try_release_module_ref(). It will still let us force a release but
at that point you're expecting use after frees.

>
> For example, how do you handle the case where remove() is called but
> some operations are still running on existing fds ?
>
> What if remove frees the rx_buffer while a read() call executes this ?
>
> copy_to_user(buf, device->rx_buffer, bytes_received)
>
> rx_buffer is freed by release() because it's the only buffer from the
> device structure used in read/write/ioctl, meaning that we can only
> free it when we are sure that it isn't used anywhere anymore.
>
> So, we can't do it in remove() unless remove() is delayed until the
> last release() has returned.
>
> > The lines:
> >
> > 1008 if (!device->spi)
> > 1009 kfree(device);
> >
> > make no sort of sense at all... Fortunately it's not posssible for
> > device->spi to be NULL so it's dead code.
>
> Really ? device->spi is NULL-ed in remove() so that operations on
> remaining fds can detect remove() was already called and free remaining
> resources:
>
> 1296 /* make sure ops on existing fds can abort cleanly */
> 1297 device->spi = NULL;
>
> Thanks for your time !

That's when we're unloading the module so there aren't any users left.

regards,
dan carpenter