Re: [PATCH] locking/osq_lock: fix a data race in osq_wait_next

From: Marco Elver
Date: Wed Jan 22 2020 - 17:39:16 EST




On Wed, 22 Jan 2020, Qian Cai wrote:

>
>
> > On Jan 22, 2020, at 11:59 AM, Will Deacon <will@xxxxxxxxxx> wrote:
> >
> > I don't understand this; 'next' is a local variable.
> >
> > Not keen on the onslaught of random "add a READ_ONCE() to shut the
> > sanitiser up" patches we're going to get from kcsan :(
>
> My fault. I suspect it is node->next. Iâll do a bit more testing to confirm.

If possible, decode and get the line numbers. I have observed a data
race in osq_lock before, however, this is the only one I have recently
seen in osq_lock:

read to 0xffff88812c12d3d4 of 4 bytes by task 23304 on cpu 0:
Âosq_lock+0x170/0x2f0 kernel/locking/osq_lock.c:143

while (!READ_ONCE(node->locked)) {
/*
* If we need to reschedule bail... so we can block.
* Use vcpu_is_preempted() to avoid waiting for a preempted
* lock holder:
*/
--> if (need_resched() || vcpu_is_preempted(node_cpu(node->prev)))
goto unqueue;

cpu_relax();
}

where

static inline int node_cpu(struct optimistic_spin_node *node)
{
--> return node->cpu - 1;
}


write to 0xffff88812c12d3d4 of 4 bytes by task 23334 on cpu 1:
osq_lock+0x89/0x2f0 kernel/locking/osq_lock.c:99

bool osq_lock(struct optimistic_spin_queue *lock)
{
struct optimistic_spin_node *node = this_cpu_ptr(&osq_node);
struct optimistic_spin_node *prev, *next;
int curr = encode_cpu(smp_processor_id());
int old;

node->locked = 0;
node->next = NULL;
--> node->cpu = curr;


Thanks,
-- Marco