Bisected stability regression in 6.6
From: matoro
Date:  Sat Nov 11 2023 - 01:31:42 EST
Hi Helge, I have bisected a regression in 6.6 which is causing userspace 
segfaults at a significantly increased rate in kernel 6.6.  There seems to be 
a pathological case triggered by the ninja build tool.  The test case I have 
been using is cmake with ninja backend to attempt to build the nghttp2 
package.  In 6.6, this segfaults, not at the same location every time, but 
with enough reliability that I was able to use it as a bisection regression 
case, including immediately after a reboot.  In the kernel log, these show up 
as "trap #15: Data TLB miss fault" messages.  Now these messages can and do 
show up in 6.5 causing segfaults, but never immediately after a reboot and 
infrequently enough that the system is stable.  With kernel 6.6 I am 
completely unable to build nghttp2 under any circumstances.
I have bisected this down to the following commit:
$ git bisect good
3033cd4307681c60db6d08f398a64484b36e0b0f is the first bad commit
commit 3033cd4307681c60db6d08f398a64484b36e0b0f
Author: Helge Deller <deller@xxxxxx>
Date:   Sat Aug 19 00:53:28 2023 +0200
    parisc: Use generic mmap top-down layout and brk randomization
    parisc uses a top-down layout by default that exactly fits the generic
    functions, so get rid of arch specific code and use the generic version
    by selecting ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT.
    Note that on parisc the stack always grows up and a "unlimited stack"
    simply means that the value as defined in 
CONFIG_STACK_MAX_DEFAULT_SIZE_MB
    should be used. So RLIM_INFINITY is not an indicator to use the legacy
    memory layout.
    Signed-off-by: Helge Deller <deller@xxxxxx>
 arch/parisc/Kconfig             | 17 +++++++++++++
 arch/parisc/kernel/process.c    | 14 -----------
 arch/parisc/kernel/sys_parisc.c | 54 
+----------------------------------------
 mm/util.c                       |  5 +++-
 4 files changed, 22 insertions(+), 68 deletions(-)
I have tried applying ad4aa06e1d92b06ed56c7240252927bd60632efe ("parisc: Add 
nop instructions after TLB inserts") on top of 6.6, but it does NOT fix the 
issue.
Let me know if there is anything I can answer on this.  I can provide full 
remote access with BMC if it would help.