Re: [GIT pull] printk updates for 4.15

From: Prarit Bhargava
Date: Tue Nov 14 2017 - 08:29:11 EST




On 11/14/2017 05:03 AM, Petr Mladek wrote:
> On Mon 2017-11-13 17:18:33, Linus Torvalds wrote:
>> On Mon, Nov 13, 2017 at 1:36 AM, Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
>>> Linus,
>>>
>>> please pull the latest core-printk-for-linus git tree from:
>>>
>>> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git core-printk-for-linus
>>>
>>> This update adds the mechanisms to emit printk timestamps based on
>>> different clocks:
>>>
>>> - scheduler clock (default)
>>> - monotonic time
>>> - boot time
>>> - wall clock time
>>>
>>> This helps to correlate dmesg with user space log information, tracing,
>>> etc. This can be local correlation or in case of wall clock time correlated
>>> across machines, assumed that all machines are synchronized via
>>> NTP/PTP.
>>
>> Honestly, this just seems bogus to me, particularly since it's a single choice.
>>
>> The *sane* model would be to
>>
>> (a) continue to use the existing time that we always have
>> (local_clock()) in the printk timestamps, and don't confuse people
>> with the semantics of that field changing.

The only time anyone would notice a change is if they used real time. Otherwise
the behaviour is pretty much the same -- the kernel starts booting with offset
0.00000 and starts incrementing shortly thereafter.

>>
>> (b) just emit a "synchronization printk" every once in a while, which
>> is obviously also using the same standard time source, but the line
>> actually _says_ what the other time sources are.
>
> This was actually the original approach by Mark Salyzyn, see
> https://lkml.kernel.org/r/20170720182505.9357-1-salyzyn@xxxxxxxxxxx

One message every 5 mins over 2 weeks is ~4000 messages. On systems with long
up times this would fill the kmsg log. The boot log is important to determine
the system topology & device configuration, etc.

>
>
>> Then it's easy to see what the printk time source is, in relation to
>> any _number_ of other timesources. And if that synchronization printk
>> is nicely formatted, it will even be something that people appreciate
>> seeing in dmesg _irrespective_ of any actual synchronization issues.
>>
>> And something that reads the journal could trivially pick up on the
>> synchronization printk line, and then correct the local timesource to
>> whatever internal journal timesource it wants to. And the important
>> thing is that because you just give *all* timesources in the
>> synchronization line, that choice isn't fixed by some random kernel
>> configuration or setting.
>
> One risk is that the messages might get lost. For example, they might be
> filtered by loglevel or during a flood of messages on slow consoles.
>
>
>> Instead, this seems to have a completely broken "pick one time source
>> model at random" approach, so now different machines will have
>> different models, and it will likely _break_ existing code that picks
>> the timesource from the kernel dmesg, unless you just pick the local
>> one.
>
> AFAIK, local clock is not synchronous between different machines
> and even CPUs on the same machine. It was used in printk() because
> it was lockless. Therefore it is kind of random itself.
>
> You could make some post-synchronization using that printed
> time stamps from other clocks. But it is not reliable (lost
> messages) and somehow inconvenient.
>
> I am not super happy that userspace might need update with
> the approach in this pull request. But it seems to be rather
> trivial. The timestamp (number) in the log can be converted into
> the date+time as following:
>
> + realtime: timestamp ~= number of micro sec. since 1.1.1970
> + other clocks: timestamp ~= number of micro sec. since boot

Note that the current behaviour of dmesg & crash are maintained. This patch
does not break existing userspace. If the kernel was built with
'CONFIG_PRINTK_TIME=y' both dmesg & crash function exactly as before.

For the new timestamp (boot, real time, or wall clock) the patch to util-linux's
dmesg is trivial: Do a read of sysfs to determine what type of clock is used
and set the offset based on that value. crash's log mirrors what was on the
console on different architectures.

I've been using this patchset for debugging purposes across different
architectures for a while now and haven't seen any issues with dmesg or crash.

>
>
>> That seems like bad design, and really stupid.
>>
>> Am I missing something? Because as-is, this just seems like a horribly
>> bad feature to me. I'm not pulling it without some very good arguments
>> for this all.
>

Mark has an Android issue that he should speak to that effects large numbers of
systems.

My issue is that many systems with bad FW & HW have timeouts due to bugs in
system configuration. For example, a system may stop functioning at exactly 200
hours (some events are *much* longer, some events are shorter). A time that
specific isn't a random event and has something to do with buggy
firmware/BIOS/BMC/HW/power. Identifying & debugging these continues to be very
difficult using the local_clock() because it is not synchronized to to NTP or PTP.

> I wonder if the current approach might be acceptable if we print
> some suffix after real-time or any non-local_clock timestamps.
> This would allow userspace to always handle this correctly.
> IMHO, it would be more reliable and convenient than the
> "synchronization printks".

That was another suggestion Mark had made too. Another idea was to encode the
timestamp type in the message structure itself ('b' for boot, for example).
These approaches are significantly more difficult to code in userspace than the
trivial dmesg patch previously mentioned.

P.