Re: [PATCH printk v5 1/1] printk: extend console_lock for per-console locking

From: Marek Szyprowski
Date: Fri May 06 2022 - 07:25:36 EST


Hi All,

On 27.04.2022 09:08, Marek Szyprowski wrote:
> On 26.04.2022 15:16, Petr Mladek wrote:
>> On Tue 2022-04-26 14:07:42, Petr Mladek wrote:
>>> On Mon 2022-04-25 23:04:28, John Ogness wrote:
>>>> Currently threaded console printers synchronize against each
>>>> other using console_lock(). However, different console drivers
>>>> are unrelated and do not require any synchronization between
>>>> each other. Removing the synchronization between the threaded
>>>> console printers will allow each console to print at its own
>>>> speed.
>>>>
>>>> But the threaded consoles printers do still need to synchronize
>>>> against console_lock() callers. Introduce a per-console mutex
>>>> and a new console boolean field @blocked to provide this
>>>> synchronization.
>>>>
>>>> console_lock() is modified so that it must acquire the mutex
>>>> of each console in order to set the @blocked field. Console
>>>> printing threads will acquire their mutex while printing a
>>>> record. If @blocked was set, the thread will go back to sleep
>>>> instead of printing.
>>>>
>>>> The reason for the @blocked boolean field is so that
>>>> console_lock() callers do not need to acquire multiple console
>>>> mutexes simultaneously, which would introduce unnecessary
>>>> complexity due to nested mutex locking. Also, a new field
>>>> was chosen instead of adding a new @flags value so that the
>>>> blocked status could be checked without concern of reading
>>>> inconsistent values due to @flags updates from other contexts.
>>>>
>>>> Threaded console printers also need to synchronize against
>>>> console_trylock() callers. Since console_trylock() may be
>>>> called from any context, the per-console mutex cannot be used
>>>> for this synchronization. (mutex_trylock() cannot be called
>>>> from atomic contexts.) Introduce a global atomic counter to
>>>> identify if any threaded printers are active. The threaded
>>>> printers will also check the atomic counter to identify if the
>>>> console has been locked by another task via console_trylock().
>>>>
>>>> Note that @console_sem is still used to provide synchronization
>>>> between console_lock() and console_trylock() callers.
>>>>
>>>> A locking overview for console_lock(), console_trylock(), and the
>>>> threaded printers is as follows (pseudo code):
>>>>
>>>> console_lock()
>>>> {
>>>>          down(&console_sem);
>>>>          for_each_console(con) {
>>>>                  mutex_lock(&con->lock);
>>>>                  con->blocked = true;
>>>>                  mutex_unlock(&con->lock);
>>>>          }
>>>>          /* console_lock acquired */
>>>> }
>>>>
>>>> console_trylock()
>>>> {
>>>>          if (down_trylock(&console_sem) == 0) {
>>>>                  if (atomic_cmpxchg(&console_kthreads_active, 0,
>>>> -1) == 0) {
>>>>                          /* console_lock acquired */
>>>>                  }
>>>>          }
>>>> }
>>>>
>>>> threaded_printer()
>>>> {
>>>>          mutex_lock(&con->lock);
>>>>          if (!con->blocked) {
>>>>         /* console_lock() callers blocked */
>>>>
>>>>                  if
>>>> (atomic_inc_unless_negative(&console_kthreads_active)) {
>>>>                          /* console_trylock() callers blocked */
>>>>
>>>>                          con->write();
>>>>
>>>> atomic_dec(&console_lock_count);
>>>>                  }
>>>>          }
>>>>          mutex_unlock(&con->lock);
>>>> }
>>>>
>>>> The console owner and waiter logic now only applies between contexts
>>>> that have taken the console_lock via console_trylock(). Threaded
>>>> printers never take the console_lock, so they do not have a
>>>> console_lock to handover. Tasks that have used console_lock() will
>>>> block the threaded printers using a mutex and if the console_lock
>>>> is handed over to an atomic context, it would be unable to unblock
>>>> the threaded printers. However, the console_trylock() case is
>>>> really the only scenario that is interesting for handovers anyway.
>>>>
>>>> @panic_console_dropped must change to atomic_t since it is no longer
>>>> protected exclusively by the console_lock.
>>>>
>>>> Since threaded printers remain asleep if they see that the console
>>>> is locked, they now must be explicitly woken in __console_unlock().
>>>> This means wake_up_klogd() calls following a console_unlock() are
>>>> no longer necessary and are removed.
>>>>
>>>> Also note that threaded printers no longer need to check
>>>> @console_suspended. The check for the @blocked field implicitly
>>>> covers the suspended console case.
>>>>
>>>> Signed-off-by: John Ogness <john.ogness@xxxxxxxxxxxxx>
>>> Nice, it it better than v4. I am going to push this for linux-next.
>>>
>>> Reviewed-by: Petr Mladek <pmladek@xxxxxxxx>
>> JFYI, I have just pushed this patch instead of the one
>> from v4 into printk/linux.git, branch rework/kthreads.
>>
>> It means that this branch has been rebased. It will be
>> used in the next refresh of linux-next.
>
> This patchset landed in linux next-20220426. In my tests I've found
> that it causes deadlock on all my Amlogic Meson G12B/SM1 based boards:
> Odroid C4/N2 and Khadas VIM3/VIM3l. The deadlock happens when system
> boots to userspace and getty (with automated login) is executed. I
> even see the bash prompt, but then the console is freezed. Reverting
> this patch (e00cc0e1cbf4) on top of linux-next (together with
> 6b3d71e87892 to make revert clean) fixes the issue.
>
The Amlogic Meson related issue has been investigated and fixed:

https://lore.kernel.org/all/b7c81f02-039e-e877-d7c3-6834728d2117@xxxxxxxxxxx/

but I just found that there is one more issue.


It appears on QCom-based DragonBoard 410c SBC
(arch/arm64/boot/dts/qcom/apq8016-sbc.dts). To see it on today's linux
next-20220506, one has to revert
42cd402b8fd4672b692400fe5f9eecd55d2794ac, otherwise lockdep triggers
other warning and it is turned off too early:

================================
WARNING: inconsistent lock state
5.18.0-rc5-next-20220506+ #11869 Not tainted
--------------------------------
inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
swapper/0/0 [HC1[1]:SC0[0]:HE0:SE1] takes:
ffff80000aaa8478 (&port_lock_key){?.+.}-{2:2}, at: msm_uart_irq+0x38/0x750
{HARDIRQ-ON-W} state was registered at:
  lock_acquire.part.0+0xe0/0x230
  lock_acquire+0x68/0x84
  _raw_spin_lock+0x5c/0x80
  __msm_console_write+0x1ac/0x220
  msm_console_write+0x48/0x60
  __console_emit_next_record+0x188/0x420
  printk_kthread_func+0x3a0/0x3bc
  kthread+0x118/0x11c
  ret_from_fork+0x10/0x20
irq event stamp: 12182
hardirqs last  enabled at (12181): [<ffff800008e3d2a8>]
cpuidle_enter_state+0xc4/0x30c

stack backtrace:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.18.0-rc5-next-20220506+ #11869
Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
Call trace:
 dump_backtrace.part.0+0xd0/0xe0
 show_stack+0x18/0x6c
 dump_stack_lvl+0x8c/0xb8
 dump_stack+0x18/0x34
 print_usage_bug.part.0+0x208/0x22c
 mark_lock+0x710/0x954
 __lock_acquire+0x9fc/0x20cc
 lock_acquire.part.0+0xe0/0x230
 lock_acquire+0x68/0x84
 _raw_spin_lock_irqsave+0x80/0xcc
 msm_uart_irq+0x38/0x750
 __handle_irq_event_percpu+0xac/0x3d0
 handle_irq_event+0x4c/0x120
 handle_fasteoi_irq+0xa4/0x1a0
 generic_handle_domain_irq+0x3c/0x60
 gic_handle_irq+0x44/0xc4
 call_on_irq_stack+0x2c/0x54
 do_interrupt_handler+0x80/0x84
 el1_interrupt+0x34/0x64
 el1h_64_irq_handler+0x18/0x24
 el1h_64_irq+0x64/0x68
 cpuidle_enter_state+0xcc/0x30c
 cpuidle_enter+0x38/0x50
 do_idle+0x22c/0x2bc
 cpu_startup_entry+0x28/0x30
 rest_init+0x110/0x190
 arch_post_acpi_subsys_init+0x0/0x18
 start_kernel+0x6c4/0x704
 __primary_switched+0xc0/0xc8
 INIT: version 2.88 booting
[info] Using makefile-style concurrent boot in runlevel S.


Reverting the following patches on top of linux next-20220506
fixes/hides this lockdep warning:

6b3d71e87892 ("printk: remove @console_locked")
8e274732115f ("printk: extend console_lock for per-console locking")
09c5ba0aa2fc ("printk: add kthread console printers")


Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland