Re: [PATCH printk v1 07/18] printk: nobkl: Add buffer management

From: John Ogness
Date: Thu Mar 23 2023 - 09:40:28 EST


On 2023-03-21, Petr Mladek <pmladek@xxxxxxxx> wrote:
>> +static __ref void cons_context_set_pbufs(struct cons_context *ctxt)
>> +{
>> + struct console *con = ctxt->console;
>> +
>> + /* Thread context or early boot? */
>> + if (ctxt->thread)
>> + ctxt->pbufs = con->thread_pbufs;
>> + else if (!con->pcpu_data)
>> + ctxt->pbufs = &early_cons_ctxt_data.pbufs;
>> + else
>> + ctxt->pbufs = &(this_cpu_ptr(con->pcpu_data)->pbufs);
>
> What exactly do we need the per-CPU buffers for, please?
> Is it for an early boot or panic or another scenario?

In case of hostile takeovers, the panic context needs to have a buffer
that the previous context (on another CPU) will not scribble
in. Currently, hostile takeovers only occur during panics. In early boot
there is only 1 CPU.

> I would expect that per-console buffer should be enough.
> The per-console atomic lock should define who owns
> the per-console buffer. The buffer must be accessed
> carefully because any context could loose the atomic lock.

A context will string-print its message into the buffer. During the
string-print it cannot check if it is still the owner. Another CPU may
be already actively printing on that console.

> Why is kthread special?

I believe the idea was that the kthread is not bound to any CPU. But
since migration must be disabled when acquiring the console, there is no
purpose for the kthread to have its own buffer. I will remove it.

> The per-CPU buffer actually looks dangerous. It might
> be used by more NOBKL consoles. How is the access synchronized
> please? By console_list_lock? It is not obvious to me.

Each console has its own set of per-CPU buffers (con->pcpu_data).

> On the contrary, we might need 4 static buffers for the early
> boot. For example, one atomic console might start printing
> in the normal context. Second atomic console might use
> the same static buffer in IRQ context. But the first console
> will not realize it because it did not loose the per-CPU
> atomic lock when the CPU handled the interrupt..
> Or is this handled another way, please?

You are correct! Although I think 3 initdata static buffers should
suffice. (2 if the system does not support NMI).


Your feedback points out that we are allocating a lot of extra memory
for the rare case of a hostile takeover from another CPU when in
panic. I suppose it would be enough to have a single dedicated panic
buffer to use in this case.

With all that in mind, we would have 3 initdata early buffers, a single
panic buffer, and per-console buffers. So the function would look
something like this:

static __ref void cons_context_set_pbufs(struct cons_context *ctxt)
{
struct console *con = ctxt->console;

if (atomic_read(&panic_cpu) == smp_processor_id())
ctxt->pbufs = &panic_ctxt_data.pbufs;
else if (con->pbufs)
ctxt->pbufs = con->pbufs;
else
ctxt->pbufs = &early_cons_ctxt_data[early_nbcon_nested].pbufs;
}

It should be enough to increment @early_nbcon_nested in cons_get_wctxt()
and decrement it in a new cons_put_wctxt() that is called after
cons_atomic_flush_con().


Originally in tglx's design, hostile takeovers were allowed at any time,
which requires the per-CPU data per console. His idea was that the
policy about hostile takeovers should be implemented outside the nbcons
framework. However, with this newly proposed change in order to avoid
per-CPU buffers for every console, we are adding an implicit rule that
hostile takeovers only occur at panic. Maybe it is ok to hard-code this
particular policy. It would certainly save significant buffer space and
I not sure if hostile takeovers make sense outside of a panic context.

John