Re: [PATCH] bpf: restrict verifier access to bpf_lru_node.ref

From: Shankari Anand
Date: Wed Jul 16 2025 - 02:32:58 EST


Hello,
>
>
> Also you misread the kcsan report.

> It says that 'read' comes from:
>
> read to 0xffff888118f3d568 of 4 bytes by task 4719 on cpu 1:
> lookup_nulls_elem_raw kernel/bpf/hashtab.c:643 [inline]

> which is reading hash and key of htab_elem while
> write side actually writes hash too:
> *(u32 *)((void *)node + lru->hash_offset) = hash;

Thanks for the clarification. I misattributed the race to the ref
field, but the KCSAN report indeed points to a data race between a
reader, lookup_nulls_elem_raw(), accessing the hash or key fields, and
a writer, bpf_lru_pop_free(), reinitializing and reusing the same
element from the LRU freelist without waiting for an RCU grace period.

> I think it is possible. The elem in the lru's freelist currently does not wait
> for a rcu gp before reuse. There is a chance that the rcu reader is still
> reading the hash value that was put in the freelist, while the writer is reusing
> and updating it.
>
> I think the percpu_freelist used in the regular hashmap should have similar
> behavior, so may be worth finding a common solution, such as waiting for a rcu
> gp before reusing it.

To resolve this, would it make sense to ensure that elements popped
from the free list are only reused after a grace period? Similar to
how other parts of the kernel manage safe object reuse.

--
Regards,
Shankari



On Wed, Jul 16, 2025 at 2:57 AM Martin KaFai Lau <martin.lau@xxxxxxxxx> wrote:
>
> On 7/15/25 7:49 AM, Alexei Starovoitov wrote:
> > Also you misread the kcsan report.
> >
> > It says that 'read' comes from:
> >
> > read to 0xffff888118f3d568 of 4 bytes by task 4719 on cpu 1:
> > lookup_nulls_elem_raw kernel/bpf/hashtab.c:643 [inline]
> >
> > which is reading hash and key of htab_elem while
> > write side actually writes hash too:
> > *(u32 *)((void *)node + lru->hash_offset) = hash;
> >
> > Martin,
> > is it really possible for these read/write to race ?
>
> I think it is possible. The elem in the lru's freelist currently does not wait
> for a rcu gp before reuse. There is a chance that the rcu reader is still
> reading the hash value that was put in the freelist, while the writer is reusing
> and updating it.
>
> I think the percpu_freelist used in the regular hashmap should have similar
> behavior, so may be worth finding a common solution, such as waiting for a rcu
> gp before reusing it.