Re: [PATCH RFC 0/2] nfsd: issue POSIX_FADV_DONTNEED after READ/WRITE/COMMIT
From: NeilBrown
Date: Fri Jul 04 2025 - 03:35:30 EST
On Fri, 04 Jul 2025, Chuck Lever wrote:
> On 7/3/25 7:16 PM, NeilBrown wrote:
> > On Fri, 04 Jul 2025, Jeff Layton wrote:
> >> Chuck and I were discussing RWF_DONTCACHE and he suggested that this
> >> might be an alternate approach. My main gripe with DONTCACHE was that it
> >> kicks off writeback after every WRITE operation. With NFS, we generally
> >> get a COMMIT operation at some point. Allowing us to batch up writes
> >> until that point has traditionally been considered better for
> >> performance.
> >
> > I wonder if that traditional consideration is justified, give your
> > subsequent results. The addition of COMMIT in v3 allowed us to both:
> > - delay kicking off writes
> > - not wait for writes to complete
> >
> > I think the second was always primary. Maybe we didn't consider the
> > value of the first enough.
> > Obviously the client caches writes and delays the start of writeback.
> > Adding another delay on the serve side does not seem to have a clear
> > justification. Maybe we *should* kick-off writeback immediately. There
> > would still be opportunity for subsequent WRITE requests to be merged
> > into the writeback queue.
>
> Dave Chinner had the same thought a while back. So I've experimented
> with starting writes as part of nfsd_write(). Kicking off writes,
> even without waiting, is actually pretty costly, and it resulted in
> worse performance.
Was this with filemap_fdatawrite_range_kick() or something else?
>
> Now that .pc_release is called /after/ the WRITE response has been sent,
> though, that might be a place where kicking off writeback could be done
> without charging that latency to the client.
Certainly after is better. I was imagining kicking the writeback
threads, but they seems to only do whole filesystems. Maybe that is OK?
I doubt we want more than one thread kicking off write for any given
inode at a time. Maybe it would be best to add a "kicking_write" flag
in the filecache and if it is already set, then just add the range to
some range of pending-writes. Maybe.
I guess I should explore how to test :-)
NeilBrown