Re: [PATCH 2/2] printk/panic/x86: Allow to access printk log buffer after crash_smp_send_stop()

From: Sergey Senozhatsky
Date: Thu Jul 18 2019 - 07:30:01 EST


On (07/18/19 14:07), Konstantin Khlebnikov wrote:
> > Let me test the waters. Criticize the following idea:
> >
> > Can we, sort of, disconnect "supposed to be dead" CPUs from printk()
> > so then we can unconditionally re-init printk() from panic-CPU?
> >
> > We have per-CPU printk_state; so panic-CPU can set, let's say,
> > DEAD_CPUS_TELL_NO_TALES bit on all CPUs but self, and vprintk_func()
> > will do nothing if DEAD_CPUS_TELL_NO_TALES bit set on particular
> > CPU. Foreign CPUs are not even supposed to be alive, and smp_send_stop()
> > waits for IPI acks from secondary CPUs long enough on average (need
> > to check that) so if one of the CPUs is misbehaving and doesn't want
> > to die (geez...) we will just "disconnect" it from printk() to minimize
> > possible logbuf/console drivers interventions and then proceed with
> > panic; assuming that misbehaving CPUs are actually up to something
> > sane. Sometimes, you know, in some cases, those CPUs are already dead:
> > either accidentally powered off, or went completely nuts and do nothing,
> > etc. etc. but we still can kdump() and console_flush_on_panic().
>
> Good idea.
> Panic-CPU could just increment state to reroute printk into 'safe'
> per-cpu buffer.

Yeah, that's better.

So we can do something like this

@@ -269,15 +269,21 @@ void printk_safe_flush_on_panic(void)
* Make sure that we could access the main ring buffer.
* Do not risk a double release when more CPUs are up.
*/
- if (raw_spin_is_locked(&logbuf_lock)) {
- if (num_online_cpus() > 1)
- return;
+ debug_locks_off();
+ raw_spin_lock_init(&logbuf_lock);
+ /* + re-init the rest of printk() locks */
+ printk_safe_flush();
+}

[..]

+void printk_switch_to_panic_mode(int panic_cpu)
+{
+ int cpu;

+ for_each_possible_cpu(cpu) {
+ if (cpu == panic_cpu)
+ continue;
+ per_cpu(printk_context, cpu) = 42;
+ }
}

And call printk_switch_to_panic_mode() from panic(). And we don't
need to touch arch code (it also covers the case when some new ARCH
will gain NMI support).

-ss