Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking

From: Marek Szyprowski
Date: Wed May 04 2022 - 02:05:34 EST


Hi,

On 03.05.2022 08:49, Petr Mladek wrote:
> On Tue 2022-05-03 01:13:19, Marek Szyprowski wrote:
>> On 02.05.2022 15:17, Petr Mladek wrote:
>>> On Mon 2022-05-02 11:19:07, Marek Szyprowski wrote:
>>>> On 30.04.2022 18:00, John Ogness wrote:
>>>>> On 2022-04-29, Marek Szyprowski <m.szyprowski@xxxxxxxxxxx> wrote:
>>>>>> The same issue happens if I boot with init=/bin/bash
>>>>> Very interesting. Since you are seeing all the output up until you try
>>>>> doing something, I guess receiving UART data is triggering the issue.
>>>> Right, this is how it looks like.
>>>>
>>>>>> I found something really interesting. When lockup happens, I'm still
>>>>>> able to log via ssh and trigger any magic sysrq action via
>>>>>> /proc/sysrq-trigger.
>>>>> If you boot the system and directly login via ssh (without sending any
>>>>> data via serial), can you trigger printk output on the UART? For
>>>>> example, with:
>>>>>
>>>>> echo hello > /dev/kmsg
>>>>>
>>>>> (You might need to increase your loglevel to see it.)
>>>> Data written to /dev/kmsg and all kernel logs were always displayed
>>>> correctly. Also data written directly to /dev/ttyAML0 is displayed
>>>> properly on the console. The latter doesn't however trigger the input
>>>> related activity.
>>>>
>>>> It looks that the data read from the uart is delivered only if other
>>>> activity happens on the kernel console. If I type 'reboot' and press
>>>> enter, nothing happens immediately. If I type 'date >/dev/ttyAML0' via
>>>> ssh then, I only see the date printed on the console. However if I type
>>>> 'date >/dev/kmsg', the the date is printed and reboot happens.
>>> This is really interesting.
>>>
>>> 'date >/dev/kmsg' should be handled like a normal printk().
>>> It should get pushed to the console using printk kthread,
>>> that calls call_console_driver() that calls con->write()
>>> callback. In our case, it should be meson_serial_console_write().
>>>
>>> I am not sure if meson_serial_console_write() is used also
>>> when writing via /dev/ttyAML0.
>>>
>>>>>> It turned out that the UART console is somehow blocked, but it
>>>>>> receives and buffers all the input. For example after issuing "echo
>>>>>> >/proc/sysrq-trigger" from the ssh console, the UART console has been
>>>>>> updated and I see the magic sysrq banner and then all the commands I
>>>>>> blindly typed in the UART console! However this doesn't unblock the
>>>>>> console.
>>>>> sysrq falls back to direct printing. This would imply that the kthread
>>>>> printer is somehow unable to print.
>>>>>
>>>>>> Here is the output of 't' magic sys request:
>>>>>>
>>>>>> https://protect2.fireeye.com/v1/url?k=8649f24d-e73258c4-86487902-74fe48600034-a2ca6bb18361467d&q=1&e=1bc4226f-a422-42b9-95e8-128845b8609f&u=https%3A%2F%2Fpastebin.com%2FfjbRuy4f
>>>>> It looks like the call trace for the printing kthread (pr/ttyAML0) is
>>>>> corrupt.
>>>> Right, good catch. For comparison, here is a 't' sysrq result from the
>>>> 'working' serial console (next-20220429), which happens usually 1 of 4
>>>> boots:
>>>>
>>>> https://protect2.fireeye.com/v1/url?k=610106f1-008a13b6-61008dbe-000babff99aa-11083c39c44861df&q=1&e=2eafad9e-c5d2-4696-9d78-f3b5885256dc&u=https%3A%2F%2Fpastebin.com%2Fmp8zGFbW
>>> Strange. The backtrace is weird here too:
>>>
>>> [ 50.514509] task:pr/ttyAML0 state:R running task stack: 0 pid: 65 ppid: 2 flags:0x00000008
>>> [ 50.514540] Call trace:
>>> [ 50.514548] __switch_to+0xe8/0x160
>>> [ 50.514561] meson_serial_console+0x78/0x118
>>>
>>> There should be kthread() and printk_kthread_func() on the stack.
>>>
>>> Hmm, meson_serial_console+0x78/0x118 is weird on its own.
>>> meson_serial_console is the name of the structure. I would
>>> expect a name of the .write callback, like
>>> meson_serial_console_write()
>> Well, this sounds a bit like a stack corruption. For the reference, I've
>> checked what magic sysrq 't' reports for my other test boards:
>>
>> RaspberryPi4:
>>
>> [  166.702431] task:pr/ttyS0        state:R stack:    0 pid:   64
>> ppid:     2 flags:0x00000008
>> [  166.711069] Call trace:
>> [  166.713647]  __switch_to+0xe8/0x160
>> [  166.717216]  __schedule+0x2f4/0x9f0
>> [  166.720862]  log_wait+0x0/0x50
>> [  166.724081] task:vfio-irqfd-clea state:I stack:    0 pid:   65
>> ppid:     2 flags:0x00000008
>> [  166.732698] Call trace:
>>
>>
>> ARM Juno R1:
>>
>> [   74.356562] task:pr/ttyAMA0      state:R  running task stack:    0
>> pid:   47 ppid:     2 flags:0x00000008
>> [   74.356605] Call trace:
>> [   74.356617]  __switch_to+0xe8/0x160
>> [   74.356637]  amba_console+0x78/0x118
>> [   74.356657] task:kworker/2:1     state:I stack:    0 pid:   48
>> ppid:     2 flags:0x00000008
>> [   74.356695] Workqueue:  0x0 (mm_percpu_wq)
>> [   74.356738] Call trace:
>>
>>
>> QEMU virt/arm64:
>>
>> [  174.155760] task:pr/ttyAMA0      state:S stack:    0 pid:   26
>> ppid:     2 flags:0x00000008
>> [  174.156305] Call trace:
>> [  174.156529]  __switch_to+0xe8/0x160
>> [  174.157131]  0xffff5ebbbfdd62d8
> You mentioned in the other mail that the other boards work as
> expected. I mean that console gets stuck only on the meson board.
> Is it true, please?

Right. Even on Meson based boards the console is operational about 1 of
4 boots.


> The stack looks really weird. But another weird thing is that
> even the meson board is able to show the messages, for example,
> using echo hello >/dev/kmsg. It suggests that the kthreads
> somehow work.
>
> There is also a possibility that this code path is optimized
> some special way and the unwinder has troubles to show
> the stack correctly.

I doubt that this is a result of the compiler's optimization. See my
logs from QEMU's virt machine. I've managed to capture 2 states of
ttyAMA0 task. One shows some kind of stack corruption imho. It doesn't
happen always though.

> ...

Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland