Re: [PATCH v2] buffer: Fix I/O error due to ARM read-after-read hazard

From: Russell King - ARM Linux admin
Date: Wed Nov 13 2019 - 05:32:12 EST


On Wed, Nov 13, 2019 at 10:23:58AM +0000, Will Deacon wrote:
> On Tue, Nov 12, 2019 at 10:39:01AM -0800, Linus Torvalds wrote:
> > On Tue, Nov 12, 2019 at 10:22 AM Catalin Marinas
> > <catalin.marinas@xxxxxxx> wrote:
> > >
> > > OK, so this includes changing test_bit() to perform a READ_ONCE.
> >
> > That's not going to happen.
>
> Ok, I'll stick my neck out here, but if test_bit() is being used to read
> a bitmap that is being concurrently modified (e.g. by set_bit() which boils
> down to atomic_long_or()), then why isn't READ_ONCE() required? Right now,
> test_bit takes a 'const volatile unsigned long *addr' argument, so I don't
> see that you'll get a change in codegen except on alpha and, with this
> erratum, arm32.

I'm not entirely clear what you're suggesting, so I'll just pick the
scenario that I think you're talking about - but I'm not sure it's the
one you're intending.

Using test_bit() in one thread and set_bit() on the same bit in another
thread without locking is going to be racy by definition. It's entirely
possible for:

Thread 1 Thread 2
bit = test_bit(...);
set_bit(...);
/* use bit */

and here, bit == 0 but the bit has been set by thread 2. Use of the
result from test_bit() is inherently a non-atomic operation.

This is why we have test_and_set_bit() and friends that atomically test
that a bit is clear before setting it. Where this is especially
important is for some filesystems, as they use test_and_xxx_bit() to
manage their allocation bitmaps.

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up