Re: [PATCH v4] locking/rwbase: Mitigate indefinite writer starvation

From: Mel Gorman
Date: Wed Feb 08 2023 - 15:19:37 EST


On Mon, Feb 06, 2023 at 03:30:35PM +0100, Thomas Gleixner wrote:
> Mel!

Hi :)

I'm not really online for the next several weeks so further responses may
take ages. It's co-incidence that I'm online at the moment for an unrelated
matter and glancing through mail.

>
> On Fri, Jan 20 2023 at 14:08, Mel Gorman wrote:
> > dio_truncate is not a realtime application but indefinite writer starvation
> > is undesirable. The test case has one writer appending and truncating files
> > A and B while multiple readers read file A. The readers and writer are
> > contending for one file's inode lock which never succeeds as the readers
> > keep reading until the writer is done which never happens.
> >
> > This patch records a timestamp when the first writer is blocked. DL /
>
> git grep 'This patch' Documentation/process/
>

I'm aware of the rule but tend to forget at times as enforcement varies
between subsystems. First sentence of the paragraph becomes;

Record a timestamp when the first writer is blocked and force all new
readers into the slow path upon expiration.

> > RT tasks can continue to take the lock for read as long as readers exist
> > indefinitely. Other readers can acquire the read lock unless a writer
> > has been blocked for a minimum of 4ms. This is sufficient to allow the
> > dio_truncate test case to complete within the 30 minutes timeout.
>
> I'm not opposed to this, but what's the actual reason for this pulled
> out of thin air timeout?
>

No good reason, a value had to be picked. It happens to match the rwsem
cutoff for optimistic spinning. That at least is some threshold for "a
lock failed to be acquired within a reasonable time period". It's also
arbitrary that it happened to be a value that allowed the dio_truncate
LTP test to complete in a reasonable time.

> What's the downside of actually forcing !RT readers into the slowpath
> once there is a writer waiting?
>

I actually don't know for sure because it's application dependant but at
minimum, I believe it would be a deviation from how generic rwsems behave
where a writer optimistically spins for the same duration before forcing
the handoff. Whether that matters or not depends on the application,
the ratio between readers/writers and the number of concurrent readers.

--
Mel Gorman
SUSE Labs