Re: [PATCH v2 0/2] fix gss seqno handling to be more rfc-compliant

From: Nikhil Jha
Date: Wed Jun 11 2025 - 15:05:32 EST


On Wed, Jun 11, 2025 at 02:54:09PM -0400, Chuck Lever wrote:
> On 6/11/25 2:50 PM, Nikhil Jha wrote:
> > On Thu, Mar 20, 2025 at 09:16:15AM -0400, Chuck Lever wrote:
> >> On 3/19/25 1:02 PM, Nikhil Jha via B4 Relay wrote:
> >>> When the client retransmits an operation (for example, because the
> >>> server is slow to respond), a new GSS sequence number is associated with
> >>> the XID. In the current kernel code the original sequence number is
> >>> discarded. Subsequently, if a response to the original request is
> >>> received there will be a GSS sequence number mismatch. A mismatch will
> >>> trigger another retransmit, possibly repeating the cycle, and after some
> >>> number of failed retries EACCES is returned.
> >>>
> >>> RFC2203, section 5.3.3.1 suggests a possible solution... “cache the
> >>> RPCSEC_GSS sequence number of each request it sends” and "compute the
> >>> checksum of each sequence number in the cache to try to match the
> >>> checksum in the reply's verifier." This is what FreeBSD’s implementation
> >>> does (rpc_gss_validate in sys/rpc/rpcsec_gss/rpcsec_gss.c).
> >>>
> >>> However, even with this cache, retransmits directly caused by a seqno
> >>> mismatch can still cause a bad message interleaving that results in this
> >>> bug. The RFC already suggests ignoring incorrect seqnos on the server
> >>> side, and this seems symmetric, so this patchset also applies that
> >>> behavior to the client.
> >>>
> >>> These two patches are *not* dependent on each other. I tested them by
> >>> delaying packets with a Python script hooked up to NFQUEUE. If it would
> >>> be helpful I can send this script along as well.
> >>>
> >>> Signed-off-by: Nikhil Jha <njha@xxxxxxxxxxxxxx>
> >>> ---
> >>> Changes since v1:
> >>> * Maintain the invariant that the first seqno is always first in
> >>> rq_seqnos, so that it doesn't need to be stored twice.
> >>> * Minor formatting, and resending with proper mailing-list headers so the
> >>> patches are easier to work with.
> >>>
> >>> ---
> >>> Nikhil Jha (2):
> >>> sunrpc: implement rfc2203 rpcsec_gss seqnum cache
> >>> sunrpc: don't immediately retransmit on seqno miss
> >>>
> >>> include/linux/sunrpc/xprt.h | 17 +++++++++++-
> >>> include/trace/events/rpcgss.h | 4 +--
> >>> include/trace/events/sunrpc.h | 2 +-
> >>> net/sunrpc/auth_gss/auth_gss.c | 59 ++++++++++++++++++++++++++----------------
> >>> net/sunrpc/clnt.c | 9 +++++--
> >>> net/sunrpc/xprt.c | 3 ++-
> >>> 6 files changed, 64 insertions(+), 30 deletions(-)
> >>> ---
> >>> base-commit: 7eb172143d5508b4da468ed59ee857c6e5e01da6
> >>> change-id: 20250314-rfc2203-seqnum-cache-52389d14f567
> >>>
> >>> Best regards,
> >>
> >> This seems like a sensible thing to do to me.
> >>
> >> Acked-by: Chuck Lever <chuck.lever@xxxxxxxxxx>
> >>
> >> --
> >> Chuck Lever
> >
> > Hi,
> >
> > We've been running this patch for a while now and noticed a (very silly
> > in hindsight) bug.
> >
> > maj_stat = gss_validate_seqno_mic(ctx, task->tk_rqstp->rq_seqnos[i], seq, p, len);
> >
> > needs to be
> >
> > maj_stat = gss_validate_seqno_mic(ctx, task->tk_rqstp->rq_seqnos[i++], seq, p, len);
> >
> > Or the kernel gets stuck in a loop when you have more than two retries.
> > I can resend this patch but I noticed it's already made its way into
> > quite a few trees. Should this be a separate patch instead?
>
> The course of action depends on what trees you found the patch in.
>
>
> --
> Chuck Lever

It shows up here, so I think it's in v6.16-rc1 already.

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.16-rc1&id=08d6ee6d8a10aef958c2af16bb121070290ed589

- Nikhil