Re: [RFC PATCH printk v1] printk: ringbuffer: Do not skip non-finalized with prb_next_seq()

From: Petr Mladek
Date: Thu Oct 26 2023 - 04:07:14 EST


On Wed 2023-10-25 17:16:09, Petr Mladek wrote:
> On Thu 2023-10-19 15:31:45, John Ogness wrote:
> > --- a/kernel/printk/printk_ringbuffer.c
> > +++ b/kernel/printk/printk_ringbuffer.c
> > +static bool _prb_read_valid(struct printk_ringbuffer *rb, u64 *seq,
> > + struct printk_record *r, unsigned int *line_count);
> > +
> > +/*
> > + * Check if there are records directly following @last_finalized_seq that are
> > + * finalized. If so, update @last_finalized_seq to the latest of these
> > + * records. It is not allowed to skip over records that are not yet finalized.
>
> I would add some details about what expect from this value. Something
> like:
>
> * @last_finalized_seq value guarantees that all records up to this
> * sequence number are finalized and might be read. The only exception
> * are too old records which have already been overwritten.
> *
> * Also it is guaranteed that the value can only grow.
> *
> * But it is just a best effort. There is _no_ synchronization between
> * writers finalizing records and @last_finalized_seq updates. The result
> * might be a lower value because updaters might not see finalized
> * records from other CPUs.
> *
> * There is _no_ easy way to serialize these two operations. It would
> * require to remember two values which might be called handled in:
> *
> * @last_finalized_id in desc_make_final()
> * @last_readable_seq in desc_update_last_readable_seq()
> *
> * and these two values would need to be updated atomically together
> * so that only the last updater of the @id part would be allowed
> * to update the @seq part. Also it would require a barrier so
> * that each writer would see the finalized state from other
> * writers whom updated the @id part earlier.
> *
> * Summary:
> *
> * This value might be used by readers only to check the last
> * readable seq number at the given time. They must count with
> * the fact that new records might appear at any time.
> *
> * Beware that this value is not increased with every finalized
> * record. It stays the same when there are not-yet-finalized
> * records with lower sequence number.
> *
> * In particular, it does not guarantee that readers woken
> * via wake_up_klogd would be able to flush all messages
> * up to the one just stored by vprintk_store(). For example,
> * it does not guarantee that console_unlock() would flush
> * all pending messages.
> */

The last paragraph sounds scary. But it is a pretty old problem.
I was curious why nobody noticed it.

IMHO, we are mostly safe in practice because:

1. Every printk() either tries to flush the consoles or
queues irq_work to do a deferred flush.

2. Every printk() calls wake_up_klogd() to wake user space
log daemons.

3. console_unlock() checks prb_next_seq() after releasing
the semaphore. It goes back and flushes any message
finalized in parallel or the parallel writer is able
to take the console semaphore.

By other words, even when one flush is not able to flush
everything there always should be scheduled someone
who would flush the rest in a near future.

The only problem might be a missing barrier when some CPU
sees a finalized record and others do not see it. But it is
probably very hard to hit in practice.

Anyway, I haven't been aware about this prb_next_seq() limitation
until last week. We should keep it in mind or improve it.
But it seems that it is not that critical in the end.

Best Regards,
Petr