Re: [PATCH v2] nvmem: core: Fix race in nvmem_register()

From: Hector Martin
Date: Tue Jan 03 2023 - 10:14:23 EST


On 04/01/2023 00.06, Russell King (Oracle) wrote:
> Hi Hector,
>
> On Tue, Jan 03, 2023 at 10:48:52PM +0900, Hector Martin wrote:
>>>> @@ -822,11 +822,8 @@ struct nvmem_device *nvmem_register(const struct nvmem_config *config)
>>>> break;
>>>> }
>>>>
>>>> - if (rval) {
>>>> - ida_free(&nvmem_ida, nvmem->id);
>>>> - kfree(nvmem);
>>>> - return ERR_PTR(rval);
>>>> - }
>>>> + if (rval)
>>>> + goto err_gpiod_put;
>>>
>>> Why was gpiod changes added to this patch, that should be a separate
>>> patch/discussion, as this is not relevant to the issue that you are
>>> reporting.
>>
>> Because freeing the device also does a gpiod_put in the destructor, so
>> doing this is correct in every other instance below and maintains
>> existing behavior, and it just so happens that this instance converges
>> into the same codepath so it is correct to merge it, and it just so
>> happens that the gpiod put was missing in this path to begin with so
>> this becomes a drive-by bugfix.
>>
>> If you don't like it I can remove it (i.e. reintroduce the bug for no
>> good reason) and you can submit this fix yourself, because I have no
>> incentive to waste time submitting a separate patch to fix a GPIO leak
>> in an error path corner case in a subsystem I don't own and I have much
>> bigger things to spend my (increasingly lower and lower) willingness to
>> fight for upstream submissions than this.
>>
>> Seriously, what is wrong with y'all kernel people. No other open source
>> project wastes contributors' time with stupid nitpicks like this. I
>> found a bug, I fixed it, I then fixed the issues you pointed out, and I
>> don't have the time nor energy to fight over this kind of nonsense next.
>> Do you want bugs fixed or not?
>
> This is not nonsense. We have always had a policy of one fix/change
> per patch, and in this case it makes complete and utter sense. Of
> course, the interpretation of "one change" is a matter of opinion.

The change here is the race condition fix. That change involves adding
an error cleanup path that involves a gpio_put(). Therefore it seems
logical to actually use it in that one extra case that should've used it
anyway, a few lines above.

Now,

>
> Your patch contains two bug fixes for problems:
> 1) publication of nvmem_device before it's fully setup (leading to the
> race) which has been around since the inception of nvmem stuff.
> 2) fixing a memory leak for gpiod stuff, caused by a recent patch
> 5544e90c8126 ("nvmem: core: add error handling for dev_set_name")
> from September 2022.

That's a fair argument for having two patches (I didn't know the gpiod
leak was introduced later). However, the backport is nontrivial anyway
if you want clean code, because if we merge the codepaths the fix would
end up being different in backports and mainline. Which means we now
need 3 patches for them to apply properly. Which is more effort than I'm
willing to put in for an issue I don't care about.

But the bigger problem is that this isn't what Srini replied with, he's
now saying my patch is outright broken, and I'm tired of this nonsense.

- Hector