Re: [PATCH] sparc64: fix hugetlb for sun4u

From: John Paul Adrian Glaubitz
Date: Sat Aug 09 2025 - 02:23:32 EST


Hi Anthony,

On Sat, 2025-08-09 at 00:37 +0200, John Paul Adrian Glaubitz wrote:
> > Maybe try enabling CONFIG_DEBUG_VM_IRQSOFF, CONFIG_DEBUG_VM, and perhaps
> > CONFIG_DEBUG_VM_PGFLAGS. That might help detect a problem closer to the
> > source. You can also try adding transparent_hugepage=never to the kernel
> > boot line to see if compiling in THP support but not using it is okay.
>
> OK, I will try that. But not today anymore. It's half past midnight now here in Germany
> and I was debugging this issue almost all day long. I'm glad to have finally been able
> to track this down to THP support being enabled.
>
> Maybe you can try whether you can reproduce this in QEMU as well.

OK, first data point: Setting CONFIG_TRANSPARENT_HUGEPAGE_NEVER=y causes the backtrace during
boot to disappear with CONFIG_TRANSPARENT_HUGEPAGE=y. However, it still disappears later when
running "apt update && apt -y upgrade" again:

[ 170.472743] kernel BUG at fs/ext4/inode.c:1174!
[ 170.532313] \|/ ____ \|/
"@'/ .. \`@"
/_| \__/ |_\
\__U_/
[ 170.725707] apt(1085): Kernel bad sw trap 5 [#1]
[ 170.786396] CPU: 0 UID: 0 PID: 1085 Comm: apt Not tainted 6.16.0+ #35 VOLUNTARY
[ 170.883619] TSTATE: 0000004411001603 TPC: 000000000075ee68 TNPC: 000000000075ee6c Y: 00000000 Not tainted
[ 171.012868] TPC: <ext4_block_write_begin+0x408/0x480>
[ 171.079299] g0: 0000000000000000 g1: 0000000000000001 g2: 0000000000000000 g3: 0000000000000000
[ 171.193692] g4: fff0000007802340 g5: fff000023d194000 g6: fff0000004aa8000 g7: 0000000000000001
[ 171.308157] o0: 0000000000000023 o1: 0000000000d74b28 o2: 0000000000000496 o3: 0000000000101cca
[ 171.422531] o4: 0000000001568800 o5: 0000000000000000 sp: fff0000004aab161 ret_pc: 000000000075ee60
[ 171.541487] RPC: <ext4_block_write_begin+0x400/0x480>
[ 171.607814] l0: fff000000274c6a8 l1: 0000000000113cca l2: fff000000274c540 l3: 0000000000001000
[ 171.722195] l4: 0000000000000002 l5: 0000000000080000 l6: 0000000000012000 l7: 0000000000000001
[ 171.836568] i0: 0000000000000000 i1: 000c000000374400 i2: 0000000000001fc0 i3: 0000000000680000
[ 171.950944] i4: 0000000000000000 i5: 0000000000000000 i6: fff0000004aab251 i7: 00000000007625d8
[ 172.065317] I7: <ext4_da_write_begin+0x158/0x300>
[ 172.127075] Call Trace:
[ 172.159101] [<00000000007625d8>] ext4_da_write_begin+0x158/0x300
[ 172.238023] [<00000000005b856c>] generic_perform_write+0x8c/0x240
[ 172.318085] [<000000000074aef0>] ext4_buffered_write_iter+0x50/0x120
[ 172.401586] [<00000000006954e0>] vfs_write+0x2a0/0x400
[ 172.469059] [<0000000000695784>] ksys_write+0x44/0xe0
[ 172.535395] [<0000000000406274>] linux_sparc_syscall+0x34/0x44
[ 172.612029] Disabling lock debugging due to kernel taint
[ 172.681796] Caller[00000000007625d8]: ext4_da_write_begin+0x158/0x300
[ 172.766430] Caller[00000000005b856c]: generic_perform_write+0x8c/0x240
[ 172.852213] Caller[000000000074aef0]: ext4_buffered_write_iter+0x50/0x120
[ 172.941429] Caller[00000000006954e0]: vfs_write+0x2a0/0x400
[ 173.014627] Caller[0000000000695784]: ksys_write+0x44/0xe0
[ 173.086684] Caller[0000000000406274]: linux_sparc_syscall+0x34/0x44
[ 173.169033] Caller[0000000000000000]: 0x0
[ 173.221645] Instruction DUMP:
[ 173.221648] 110035d2
[ 173.260532] 7ff358e0
[ 173.291414] 90122328
[ 173.322289] <91d02005>
[ 173.353172] 80a06000
[ 173.384051] 02480010
[ 173.414937] d45fa7cf
[ 173.445815] d85fa7cf
[ 173.476697] 9736a000

So, even just compiling in the THP support code already triggers the bug.

Will now test with the debug flags enabled.

Adrian

--
.''`. John Paul Adrian Glaubitz
: :' : Debian Developer
`. `' Physicist
`- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913