Re: [6.2.0-rc7] BUG: KASAN: slab-out-of-bounds in hop_cmp+0x26/0x110

From: Yury Norov
Date: Wed Feb 15 2023 - 19:31:41 EST


+ Kees Cook <keescook@xxxxxxxxxxxx>
+ Miguel Ojeda <ojeda@xxxxxxxxxx>
+ Nick Desaulniers <ndesaulniers@xxxxxxxxxx>

On Wed, Feb 15, 2023 at 09:24:52AM +0100, Bruno Goncalves wrote:
> On Tue, 14 Feb 2023 at 15:32, Yury Norov <yury.norov@xxxxxxxxx> wrote:
> >
> > On Tue, Feb 14, 2023 at 02:23:06PM +0100, Bruno Goncalves wrote:
> > > Hello,
> > >
> > > recently when testing kernel with debug options set from net-next [1]
> > > and bpf-next [2] the following call trace happens:
> > >
> > Hi Bruno,
> >
> > Thanks for report.
> >
> > This looks weird, because the hop_cmp() spent for 3 month in -next till
> > now. Anyways, can you please share your NUMA configuration so I'll try
> > to reproduce the bug locally? What 'numactl -H' outputs?
> >
>
> Here is the output:
>
> numactl -H
> available: 4 nodes (0-3)
> node 0 cpus: 0 1 2 3 4 5 6 7 32 33 34 35 36 37 38 39
> node 0 size: 32063 MB
> node 0 free: 31610 MB
> node 1 cpus: 8 9 10 11 12 13 14 15 40 41 42 43 44 45 46 47
> node 1 size: 32248 MB
> node 1 free: 31909 MB
> node 2 cpus: 16 17 18 19 20 21 22 23 48 49 50 51 52 53 54 55
> node 2 size: 32248 MB
> node 2 free: 31551 MB
> node 3 cpus: 24 25 26 27 28 29 30 31 56 57 58 59 60 61 62 63
> node 3 size: 32239 MB
> node 3 free: 31468 MB
> node distances:
> node 0 1 2 3
> 0: 10 21 31 21
> 1: 21 10 21 31
> 2: 31 21 10 21
> 3: 21 31 21 10
>
> Bruno

So, I was able to reproduce it, and it seems like a compiler issue.

The problem is that hop_cmp() calculates pointer to a previous hop
object unconditionally at the beginning of the function:

struct cpumask **prev_hop = *((struct cpumask ***)b - 1);

Obviously, for the first hop, there's no such thing like a previous
one, and later in the code 'prev_hop' is used conditionally on that:

k->w = (b == k->masks) ? 0 : cpumask_weight_and(k->cpus, prev_hop[k->node]);

To me the code above looks like it instructs the compiler to dereference
'b - 1' only if b != k->masks, i.e. when b is not the first hop. But GCC
does that unconditionally, which looks wrong.

If I defer dereferencing manually like in the snippet below, the kasan
warning goes away.

diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 48838a05c008..5f297f81c574 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -2081,14 +2081,14 @@ struct __cmp_key {

static int hop_cmp(const void *a, const void *b)
{
- struct cpumask **prev_hop = *((struct cpumask ***)b - 1);
struct cpumask **cur_hop = *(struct cpumask ***)b;
struct __cmp_key *k = (struct __cmp_key *)a;

if (cpumask_weight_and(k->cpus, cur_hop[k->node]) <= k->cpu)
return 1;

- k->w = (b == k->masks) ? 0 : cpumask_weight_and(k->cpus, prev_hop[k->node]);
+ k->w = (b == k->masks) ? 0 :
+ cpumask_weight_and(k->cpus, (*((struct cpumask ***)b - 1))[k->node]);
if (k->w <= k->cpu)
return 0;

I don't understand why GCC doesn't optimize out unneeded dereferencing.
It does that even if I replace ternary operator with if-else construction.
To me it looks like a compiler bug.

However, I acknowledge that I'm not a great expert in C standard, so
it's quite possible that there may be some rule that prevents from
doing such optimizations, even for non-volatile variables.

Adding compiler people. Guys, could you please clarify on that?
If it's my fault, I'll submit fix shortly.

Thanks,
Yury