Re: [PATCH] workqueue: Fix memory ordering race in queue_work*()

From: Herbert Xu
Date: Wed Aug 17 2022 - 01:06:44 EST


On Tue, Aug 16, 2022 at 09:41:52AM -0700, Linus Torvalds wrote:
.
> So I htink the code problem is easy, I think the real problem here has
> always been bad documentation, and it would be really good to clarify
> that.
>
> Comments?

The problem is that test_and_set_bit has been unambiguously
documented to have memory barriers since 2005:

commit 3085f02b869d980c5588f3e8fb136b0b465a2759
Author: David S. Miller <davem@xxxxxxxxxxxxxxxxxx>
Date: Fri Feb 4 23:39:15 2005 -0800

[DOC]: Add asm/atomic.h asm/bitops.h implementation specification.

And this is what it says:

+ int test_and_set_bit(unsigned long nr, volatils unsigned long *addr);
+ int test_and_clear_bit(unsigned long nr, volatils unsigned long *addr);
+ int test_and_change_bit(unsigned long nr, volatils unsigned long *addr);

...snip...

+These routines, like the atomic_t counter operations returning values,
+require explicit memory barrier semantics around their execution. All
+memory operations before the atomic bit operation call must be made
+visible globally before the atomic bit operation is made visible.
+Likewise, the atomic bit operation must be visible globally before any
+subsequent memory operation is made visible. For example:
+
+ obj->dead = 1;
+ if (test_and_set_bit(0, &obj->flags))
+ /* ... */;
+ obj->killed = 1;

This file wasn't removed until 16/11/2020 by f0400a77ebdc.

In that time people who wrote code using test_and_set_bit could have
legitimately relied on the memory barrier as documented. Changing
this restrospectively is dangerous.

I'm fine with introducing new primitives that have different
properties, and then converting the existing users of test_and_set_bit
over on a case-by-case basis.

Cheers,
--
Email: Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt