Re: [PATCH 1/2] nfsd: use threads array as-is in netlink interface

From: Chuck Lever
Date: Fri Jun 13 2025 - 11:40:13 EST


On 6/13/25 11:23 AM, Benjamin Coddington wrote:
> On 13 Jun 2025, at 10:56, Chuck Lever wrote:
>
>> On 6/13/25 7:33 AM, Benjamin Coddington wrote:
>>> We don't consider it acceptable to allow known defects to persist in our
>>> products just because they are bleeding edge.
>>
>> I'm not letting this issue persist. Proper testing takes time.
>>
>> The patch description and discussion around this change did not include
>> any information about its pervasiveness and only a little about its
>> severity. I used my best judgement and followed my usual rules, which
>> are:
>>
>> 1. Crashers, data corrupters, and security bugs with public bug reports
>> and confirmed fix effectiveness go in as quickly as we can test.
>> Note well that we have to balance the risk of introducing regressions
>> in this case, since going in quickly means the fix lacks significant
>> test experience.
>>
>> 1a. Rashes and bug bites require application of topical hydrocortisone.
>
> :) no rash here, this response is very soothing.
>
>> 2. Patches sit in nfsd-testing for at least two weeks; better if they
>> are there for four. I have CI running daily on that branch, and
>> sometimes it takes a while for a problem to surface and be noticed.
>>
>> 3. Patches should sit in nfsd-next or nfsd-fixes for at least as long
>> as it takes for them to matriculate into linux-next and fs-next.
>>
>> 4. If the patch fixes an issue that was introduced in the most recent
>> merge window, it goes in -fixes .
>>
>> 5. If the patch fixes an issue that is already in released kernels
>> (and we are at rule 5 because the patch does not fix an immediate
>> issue) then it goes in -next .
>>
>> These evidence-oriented guidelines are in place to ensure that we don't
>> panic and rush commits into the kernel without careful review and
>> testing. There have been plenty of times when a fix that was pushed
>> urgently was not complete or even made things worse. It's a long
>> pipeline on purpose.
>
> I totally understand, thanks very much for having a set of rules and
> guidelines and even more for taking the time to spell them out here.

Apologies for the length. I wanted to get these out in the open just
so you and others can slap me with a clue bat if I'm doing something
vastly strange or inappropriate.


> I wanted to express that Red Hat does consider all of its releases to be
> important to fix and maintain. I'd like to speak against arguments about fix
> urgency based on distro versions. I think in this case we innocently crept
> into these arguments as Jeff presented evidence that the problem exists in
> the wild.

I was estimating pervasiveness based on the position of the RHEL 10
distro in its life cycle, nothing more.


>> The issues with this patch were:
>>
>> - It was posted very late in the dev cycle for v6.16. (Jeff's urgent
>> fixes always seem to happen during -rc7 ;-)
>>
>> - The Fixes: tag refers to a commit that was several releases ago, and
>> I am not aware of specific reports of anyone hitting a similar issue.
>>
>> - IME, the adoption of enterprise distributions is slow. RHEL 10 is
>> still only on its GA release. Therefore my estimation is that the
>> number of potentially impacted customers will be small for some time,
>> enough time for us to test Jeff's fix appropriately.
>
> While this is true, I hope we can still treat every release version equally
> /if/ we make any arguments about urgency based on what's currently released
> in a particular distro. Your point is a good counter-arguement to Jeff's
> assertion that the problem has been widely distributed - but it does start
> to creep into a space which feels like we're treating certain early versions
> of a specific distro differently and didn't sit well for me. I'd rather not
> have our upstream work or decisions appear to favor a particular distro.

Understood. I hope I convinced you that I was merely making an evidence-
based estimation about the pervasiveness of any problem this patch might
have been attempting to address.

The shorthand term "bleeding edge" was not intended to be disrespectful,
only descriptive.


--
Chuck Lever