Re: [PATCH v10 1/2] printk: Make printk() completely async

From: Vladislav Levenetz
Date: Mon Aug 15 2016 - 10:37:02 EST




On 08/12/2016 12:44 PM, Petr Mladek wrote:
On Wed 2016-08-10 14:17:55, Viresh Kumar wrote:
+Vladi/Greg,

On Wed, Apr 6, 2016 at 1:27 AM, Jan Kara <jack@xxxxxxx> wrote:
On Mon 04-04-16 15:51:49, Andrew Morton wrote:
+static int __init init_printk_kthread(void)
+{
+ struct task_struct *thread;
+
+ if (printk_sync)
+ return 0;
+
+ thread = kthread_run(printk_kthread_func, NULL, "printk");
This gets normal scheduling policy, so a spinning userspace SCHED_FIFO
task will block printk for ever. This seems bad.
I have to research this a bit but won't the SCHED_FIFO task that has
potentially unbounded amount of work lockup the CPU even though it does
occasional cond_resched()?
We are facing complete hogs because of the printk thread being a SCHED_FIFO
task and have this patch to fix it up for now.

Author: Vladislav Levenetz <vblagoev@xxxxxxxxxx>
Date: Wed Aug 10 13:58:00 2016 -0700

SW-7786: printk: Lower the priority of printk thread

Flooding the console (with a test module) in a tight loop indefinitely
makes android user interface very sluggish. Opening YouTube app and the
device hangs and becomes even more unresponsive to the point it
completely hangs.

The asynchronous printk thread is a SCHED FIFO thread with priority
MAX_RT_PRIO - 1. If we create it as a simple thread (i.e. no SCHED FIFO)
instead, we observe much better performance using the same printk flood
test. We don't even notice any kind of sluggishness during device usage.
We can play a YouTube clip smoothly and use the device normally in
general. The kernel log looks fine as well, as the flood of messages
continue normally.

Signed-off-by: Vladislav Levenetz <vblagoev@xxxxxxxxxx>
Signed-off-by: Viresh Kumar <viresh.kumar@xxxxxxxxxx>
---
kernel/printk/printk.c | 4 ----
1 file changed, 4 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index c32872872cb6..ad5b30e5e6d9 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2856,9 +2856,6 @@ static int printk_kthread_func(void *data)
static int __init_printk_kthread(void)
{
struct task_struct *thread;
- struct sched_param param = {
- .sched_priority = MAX_RT_PRIO - 1,
- };

if (!printk_kthread_can_run || printk_sync || printk_kthread)
return 0;
@@ -2870,7 +2867,6 @@ static int __init_printk_kthread(void)
return PTR_ERR(thread);
}

- sched_setscheduler(thread, SCHED_FIFO, &param);
printk_kthread = thread;
return 0;
}
IMHO, this is fine. We force the synchronous mode in critical
situations anyway.

But I was curious if we could hit a printk from the wake_up_process().
The change above causes using the fair scheduler and there is
the following call chain [*]

vprintk_emit()
-> wake_up_process()
-> try_to_wake_up()
-> ttwu_queue()
-> ttwu_do_activate()
-> ttwu_activate()
-> activate_task()
-> enqueue_task()
-> enqueue_task_fair() via p->sched_class->enqueue_task
-> cfs_rq_of()
-> task_of()
-> WARN_ON_ONCE(!entity_is_task(se))

We should never trigger this because printk_kthread is a task.
But what if the date gets inconsistent?

Then there is the following chain:

vprintk_emit()
-> wake_up_process()
-> try_to_wake_up()
-> ttwu_queue()
-> ttwu_do_activate()
-> ttwu_activate()
-> activate_task()
-> enqueue_task()
-> enqueue_task_fair() via p->sched_class->enqueue_task
->hrtick_update()
-> hrtick_start_fair()
-> WARN_ON(task_rq(p) != rq)

This looks like another paranoid consistency check that might be
triggered when the scheduler gets messed.

I see few possible solutions:

1. Replace the WARN_ONs by printk_deferred().

This is the usual solution but it would make debugging less convenient.


2. Force synchronous printk inside WARN()/BUG() macros.

This would make sense even from other reasons. These are printed
when the system is in a strange state. There is no guarantee that
the printk_kthread will get scheduled.


3. Force printk_deferred() inside WARN()/BUG() macros via the per-CPU
printk_func.

It might be elegant. But we do not want this outside the scheduler
code. Therefore we would need special variants of WARN_*_SCHED()
BUG_*_SCHED() macros.


I personally prefer the 2nd solution. What do you think about it,
please?


Best Regards,
Petr

Hi Petr,

Sorry with for the late reply.

Hitting a WARN()/BUG() from wake_up calls will lead to a deadlock if only a single CPU is running.
We already had such a situation with system suspend. During a specific test on our device sometimes we hit a WARN from the time keeping core. (Cannot recall which one exactly. Viresh have it) from a printk wake_up path during system suspend and with already only one CPU running.
So we were forced to make printing synchronous in the suspend path prior disabling all non-boot cpu's.

Your solution number 2) sounds reasonable to me.

I'm wondering if we could hit a WARN()/BUG() somewhere from the fair scheduler like the example you made for the RT sched?

Thanks.