Re: [syzbot] riscv/fixes test error: lost connection to test machine

From: Alexandre Ghiti
Date: Sat May 28 2022 - 04:10:16 EST


On 5/27/22 19:12, Dmitry Vyukov wrote:
On Fri, 27 May 2022 at 19:04, Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote:
On Fri, 27 May 2022 at 16:01, Alexandre Ghiti
<alexandre.ghiti@xxxxxxxxxxxxx> wrote:
On Friday, May 27, 2022 at 3:55:24 PM UTC+2 Dmitry Vyukov wrote:
On Fri, 27 May 2022 at 15:50, Alexandre Ghiti
<alexand...@xxxxxxxxxxxxx> wrote:
On Friday, May 27, 2022 at 3:02:01 PM UTC+2 Dmitry Vyukov wrote:
On Fri, 27 May 2022 at 14:55, syzbot
<syzbot+2c5da6...@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
Hello,

syzbot found the following issue on:

HEAD commit: c932edeaf6d6 riscv: dts: microchip: fix gpio1 reg property..
git tree: git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux.git fixes
console output: https://syzkaller.appspot.com/x/log.txt?x=1418add5f00000
kernel config: https://syzkaller.appspot.com/x/.config?x=aa6b5702bdf14a17
dashboard link: https://syzkaller.appspot.com/bug?extid=2c5da6a0a16a0c4f34aa
compiler: riscv64-linux-gnu-gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
userspace arch: riscv64

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+2c5da6...@xxxxxxxxxxxxxxxxxxxxxxxxx
The CONFIG_KASAN_VMALLOC allows riscv kernel to boot, but now Go
processes started crashing with:

1970/01/01 00:06:55 fuzzer started
runtime: lfstack.push invalid packing: node=0xffffff5908a940 cnt=0x1
packed=0xffff5908a9400001 -> node=0xffff5908a940
fatal error: lfstack.push
runtime stack:
runtime.throw({0x30884c, 0xc})
/usr/local/go/src/runtime/panic.go:1198 +0x60
runtime.(*lfstack).push(0xdb3850, 0xffffff5908a940)
/usr/local/go/src/runtime/lfstack.go:30 +0x1a8

Go runtime tries to shove some data into the upper 16 bits of pointers
assuming they are unused.
However, the original pointer node=0xffffff5908a940 suggest riscv now
has 56-bit users-space address space?

Yes, sv57 was merged recently.

Documentation/riscv/vm-layout.rst claims 48-bit pointers:
"
The RISC-V privileged architecture document states that the 64bit addresses
"must have bits 63–48 all equal to bit 47, or else a page-fault exception will
occur.":

Thanks for pointing that, I extracted that from the specification before sv57 was specified, I'll fix that.

The current kernel code will use sv57 as it is supported and advertised by qemu, and to my knowledge, you can't downgrade to sv48 unless by re-compiling qemu using the following:

diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index 6dbe9b541f..a64b50ed75 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -637,7 +637,7 @@ static const char valid_vm_1_10_64[16] = {
[VM_1_10_MBARE] = 1,
[VM_1_10_SV39] = 1,
[VM_1_10_SV48] = 1,
- [VM_1_10_SV57] = 1
+ [VM_1_10_SV57] = 0
};

/* Machine Information Registers */

...
0000000000000000 | 0 | 0000003fffffffff | 256 GB |
user-space virtual memory, different per mm
"
There is no kernel config to force SV48/39, right?

No, we rely on what the hardware advertises, if it supports sv57, we'll go for sv57, if not, we'll try sv48...etc. I had some patches to force the downgrade by using the device tree but they never got merged though.
+original CC list

FTR sent Go runtime change to support SV57:
https://go-review.googlesource.com/c/go/+/409055


Is CONFIG_CMDLINE broken on riscv?
I am running with:

CONFIG_CMDLINE="earlyprintk=serial net.ifnames=0
sysctl.kernel.hung_task_all_cpu_backtrace=1 ima_policy=tcb
nf-conntrack-ftp.ports=20000 nf-conntrack-tftp.ports=20000
nf-conntrack-sip.ports=20000 nf-conntrack-irc.ports=20000
nf-conntrack-sane.ports=20000 binder.debug_mask=0
rcupdate.rcu_expedited=1 no_hash_pointers page_owner=on
sysctl.vm.nr_hugepages=4 sysctl.vm.nr_overcommit_hugepages=4
secretmem.enable=1 sysctl.max_rcu_stall_to_panic=1
msr.allow_writes=off dummy_hcd.num=2 smp.csd_lock_timeout=300000
watchdog_thresh=165 workqueue.watchdog_thresh=420
sysctl.net.core.netdev_unregister_timeout_secs=420 panic_on_warn=1"


This command line is 608-character long, but we are still stuck with the default COMMAND_LINE_SIZE to 512, I imagine that it is the problem. I had proposed a patch last year to bump that to 1024, but it never got merged https://lore.kernel.org/lkml/CAEn-LTqTXCEC=bXTvGyo8SNL0JMWRKtiSwQB7R=Pc4uhxZUruA@xxxxxxxxxxxxxx/T/#m4b45019dc0f5573f2a50c1f6007c5109fa35efff



But getting BUGs with the default timeout:
watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [kworker/0:4:2039]

_______________________________________________
linux-riscv mailing list
linux-riscv@xxxxxxxxxxxxxxxxxxx
http://lists.infradead.org/mailman/listinfo/linux-riscv