Re: Boot fails with 59faa4da7cd4 and 3accabda4da1

From: Linus Torvalds
Date: Fri Oct 10 2025 - 14:20:00 EST


On Fri, 10 Oct 2025 at 08:11, Phil Auld <pauld@xxxxxxxxxx> wrote:
>
> After several days of failed boots I've gotten it down to these two
> commits.
>
> 59faa4da7cd4 maple_tree: use percpu sheaves for maple_node_cache
> 3accabda4da1 mm, vma: use percpu sheaves for vm_area_struct cache
>
> The first is such an early failure it's silent. With just 3acca I
> get :
>
> [ 9.341152] BUG: kernel NULL pointer dereference, address: 0000000000000040
> [ 9.348115] #PF: supervisor read access in kernel mode
> [ 9.353264] #PF: error_code(0x0000) - not-present page
> [ 9.358413] PGD 0 P4D 0
> [ 9.360959] Oops: Oops: 0000 [#1] SMP NOPTI
> [ 9.365154] CPU: 21 UID: 0 PID: 818 Comm: kworker/u398:0 Not tainted 6.17.0-rc3.slab+ #5 PREEMPT(voluntary)
> [ 9.374982] Hardware name: Dell Inc. PowerEdge R7425/02MJ3T, BIOS 1.26.0 07/30/2025
> [ 9.382641] RIP: 0010:__pcs_replace_empty_main+0x44/0x1d0
> [ 9.388048] Code: ec 08 48 8b 46 10 48 8b 76 08 48 85 c0 74 0b 8b 48 18 85 c9 0f 85 e5 00 00 00 65 48 63 05 e4 ee 50 02 49 8b 84 c6 e0 00 00 00 <4c> 8b 68 40 4c 89 ef e8 b0 81 ff ff 48 89 c5 48 85 c0 74 1d 48 89

That decodes to

0: mov 0x10(%rsi),%rax
4: mov 0x8(%rsi),%rsi
8: test %rax,%rax
b: je 0x18
d: mov 0x18(%rax),%ecx
10: test %ecx,%ecx
12: jne 0xfd
18: movslq %gs:0x250eee4(%rip),%rax
20: mov 0xe0(%r14,%rax,8),%rax
28:* mov 0x40(%rax),%r13 <-- trapping instruction
2c: mov %r13,%rdi
2f: call 0xffffffffffff81e4
34: mov %rax,%rbp
37: test %rax,%rax
3a: je 0x59

which is the code around that barn_replace_empty_sheaf() call.

In particular, the trapping instruction is from get_barn(), it's the "->barn" in

return get_node(s, numa_mem_id())->barn;

so it looks like 'get_node()' is returning NULL here:

return s->node[node];

That 0x250eee4(%rip) is from "get_node()" becoming

18: movslq %gs:numa_node(%rip), %rax # node
20: mov 0xe0(%r14,%rax,8),%rax # ->node[node]

instruction, and then that ->barn dereference is the trapping
instruction that tries to read node->barn:

28:* mov 0x40(%rax),%r13 # node->barn

but I did *not* look into why s->node[node] would be NULL.

Over to you Vlastimil,

Linus