Re: [PATCH] printk: fix buffer overflow potential for print_text()

From: Sven Schnelle
Date: Fri Jan 22 2021 - 16:24:29 EST


John Ogness <john.ogness@xxxxxxxxxxxxx> writes:

> Before commit b6cf8b3f3312 ("printk: add lockless ringbuffer"),
> msg_print_text() would only write up to size-1 bytes into the
> provided buffer. Some callers expect this behavior and append
> a terminator to returned string. In particular:
>
> arch/powerpc/xmon/xmon.c:dump_log_buf()
> arch/um/kernel/kmsg_dump.c:kmsg_dumper_stdout()
>
> msg_print_text() has been replaced by record_print_text(), which
> currently fills the full size of the buffer. This causes a
> buffer overflow for the above callers.
>
> Change record_print_text() so that it will only use size-1 bytes
> for text data. Also, for paranoia sakes, add a terminator after
> the text data.
>
> And finally, document this behavior so that it is clear that only
> size-1 bytes are used and a terminator is added.
>
> Fixes: b6cf8b3f3312 ("printk: add lockless ringbuffer")
> Signed-off-by: John Ogness <john.ogness@xxxxxxxxxxxxx>
> ---
> [..]

I'm seeing crashes on s390x with this patch while running the glibc
testsuite. The glibc test suite triggers a few FPU exceptions which
are printed to the kernel log by default. Looking at the crash dump,
i see that the console_drivers pointer seems to be overwritten while
printing that (user space) warning:

crash> print *console_drivers
$1 = {
name = "\247\071\377\373\354!\004\214\000\331\300\345\000\033\353_",
write = 0xa7190001eb119078,
read = 0xe6e320b0050090,
device = 0xa51e0010a7210001,
unblank = 0xa7291000b9e28012,
setup = 0xe320f0a00004e320,
exit = 0x208e0094eb112000,
match = 0xcec1c006e017f,
flags = -4984,
index = 133,
cflag = 8143136,
data = 0x800458108004ec12,
next = 0x5007eaf000000
}

crash> x/16i console_drivers
0x79009700: lghi %r3,-5
0x79009704: aghik %r2,%r1,1164
0x7900970a: brasl %r14,0x79386dc8
0x79009710: lghi %r1,1
0x79009714: laog %r1,%r1,120(%r9)
0x7900971a: llgc %r2,5(%r11)
0x79009720: llilh %r1,16
0x79009724: tmll %r2,1
0x79009728: lghi %r2,4096
0x7900972c: locgre %r1,%r2
0x79009730: lg %r2,160(%r15)
0x79009736: llc %r2,142(%r2)
0x7900973c: srlg %r1,%r1,0(%r2)
0x79009742: clij %r1,1,12,0x7900981e
0x79009748: cgij %r8,0,8,0x79009852
0x7900974e: la %r2,4(%r8)

crash> sym 0x79009700
79009700 (t) iomap_finish_ioend+192
/usr/src/debug/kernel-5.10.fc33/linux-5.11.0-20210122.rc4.git0.9f29bd8b2e71.300.fc33.s390x/./include/linux/pagemap.h:
59

So somehow the pointer for console_drivers changes.

I can't provide the normal kernel crash output as printk is no longer
working, but in crash the backtrace looks like this:
crash> bt
PID: 859915 TASK: dad24000 CPU: 9 COMMAND: "ld.so.1"
LOWCORE INFO:
-psw : 0x0002000180000000 0x0000000078c8400e
-function : stop_run at 78c8400e
-prefix : 0x0041e000
-cpu timer: 0x7ffff953a957e166
-clock cmp: 0xd92876863ac66b00
-general registers:
0xf0f0f0f000000000 0x0000000078c8400e
0x0000000079984830 0x0000000079984830
0x0000000079ac62c0 0x0000000000000002
0x0000000000000038 000000000000000000
0x000000007a19157e 000000000000000000
0x00000000fffffffa 000000000000000000
0x00000000dad24000 000000000000000000
0x0000000078c8416e 0x00000380033136f0
-access registers:
0x7d5b4da0 0xa2a795c0 0000000000 0000000000
0000000000 0000000000 0000000000 0000000000
0000000000 0000000000 0000000000 0000000000
0000000000 0000000000 0000000000 0000000000
-control registers:
0x00a0000014966a10 0x000000007a15c007
0x000000000001d0a0 000000000000000000
0x000000000000ffff 0x000000000001d080
0x000000003b000000 0x00000001f29b81c7
0x0000000000008000 000000000000000000
000000000000000000 000000000000000000
000000000000000000 0x000000007a15c007
0x00000000db000000 0x000000000001d0c0
-floating point registers:
0x5faaaaabeb67f92f 0x0007fffffffffff8
0x40400000eb67f000 000000000000000000
0x0000000000000007 0x0000006400000000
0x000002aa2a63f028 0x000000000963cf85
0x000000002a642c60 000000000000000000
0x7f85516ceb67f222 0x000003ffeb67f528
0x000003ffeb67f230 0x0000000100000000
0x000003ffeb67f21d 0x000002aa28d2d970

#0 [38003313710] stop_run at 78c8400e
#1 [38003313728] atomic_notifier_call_chain at 78ced536
#2 [38003313778] panic at 7974aeca
#3 [38003313818] die at 78c897ea
#4 [38003313880] do_no_context at 78c9b230
#5 [380033138b8] pgm_check_handler at 7976438c
PSW: 0404c00180000000 0000000078d304ba (kmsg_dump_rewind+290)
GPRS: 0000000000000020 0000000000000000 0000000000000000 00000000ffffec88
000000000000000e 00000000793f4800 0000000000000224 0000000000000000
0000000000000000 0000000000000000 0000000000000224 0005007eaf000000
00000000dad24000 0000000000000000 0000038003313a68 0000038003313a18
#0 [38003313a68] console_unlock at 78d3158e
#1 [38003313b48] vprintk_emit at 78d32f50
#2 [38003313ba8] vprintk_default at 78d32f9e
#3 [38003313c00] printk at 7974bfae
#4 [38003313c90] show_code at 78c8588e
#5 [38003313de0] show_registers at 7974a26e
#6 [38003313e70] show_regs at 78c8961e
#7 [38003313ea0] pgm_check_handler at 7976438c
PSW: 0705300080000000 000000007d7983b2 (user space)
GPRS: 0000000000000000 000000007d798360 0000000000000010 000003ff00000004
000000007f854dd0 000000007d7e9788 000003ff00401b30 000000007f854ff8
000003ff00000000 000003ff7d5b4da0 000003ff00000000 000003ff00000010
000003ff004022c6 000003ff004024b0 0000000080401796 000000007f854de8

kmsg_dump_rewind+290 is:

if (!(con->flags & CON_ENABLED))
continue;

in kernel/printk/printk.c:1845

I haven't dived into whether our show_code() is doing something wrong
which was covered in the past or whether that's because of the patch
above but wanted to make you aware of that in case you have an idea.
Otherwise i have to dig into the code.

Thanks
Sven