Re: [PATCH 0/2] printk, Add printk.clock kernel parameter [v2]

From: Thomas Gleixner
Date: Wed Jan 13 2016 - 12:30:07 EST


On Wed, 13 Jan 2016, Prarit Bhargava wrote:
> On 01/13/2016 08:45 AM, Thomas Gleixner wrote:
> > On Wed, 13 Jan 2016, Prarit Bhargava wrote:
> >> This patchset introduces additional NMI safe timekeeping functions and the
> >> kernel parameter printk.clock=[local|boot|real|tai] allowing a
> >> user to specify an adjusted clock to use with printk timestamps. The
> >> hardware clock, or the existing functionality, is preserved by default.
> >
> > You still fail to explain WHY we need a gazillion of different clocks
> > here.
>
> I've had cases in the past where an earlier warning/failures have resulted in a
> much later panics, across several systems. Trying to synchronize all of these
> events with wall clock time is all but impossible after the event has occurred.
> I've seen cases where earlier MCAs lead to panics, earlier I/O warnings have
> lead to panics, panics/problems at a specific time, etc. Attempting to figure
> out what happened in the lab or cluster is not trivial without having a
> timestamp that can actually be synchronized against a wall clock.
>
> In the case that made me finally submit this, the disks were generating
> seemingly random I/O timeout errors which meant at that point I had no logging
> to disk (and this assumes the systems are logging to disk because I'm seeing
> more and more systems that are not). I did manage to get dmesg from crash
> dumps, however, the problem then became trying to figure out exactly what time
> the system started having problems (Was there an external event that lead to the
> failures and panics? Are the early failures across systems at the same time, or
> did they occur over several hours? Did the systems all panic at the same time?
> Was the failure at a specific time after boot and due to a weird timeout? etc.)
>
> Trying to figure out what actually is happening & debugging becomes much easier
> with the above timestamp patch because I can actually tell what time something
> happened.
>
> Admittedly, I have not used TAI. I started by using REAL, and then the BOOT
> clock to see if this was some sort of strange 10-day timeout on the system. I
> only included TAI option for completeness.

So I really don't see a reason why we would need TAI for that. Just because we
can is not one.

BOOT is questionable as well simply because the boot offset only changes when
the system suspends/resumes. So we rather note that time delta in dmesg than
having a gazillion of command line options.

That leaves REAL as a possible useful option. Though the points where the MONO
to REAL offset changes are rather limited as well (Initial RTC readout,
do_settimeofday(), adjtimex(), leap seconds, suspend/resume).

> > What's the problem with using the fast monotonic clock instead of local_clock
> > and be done with it? I really don't see the point why we would need
> > boot/real/tai and all the extra churn in the fast clock.
>
> AFAICT the local_clock() (on x86 at least) is accessed without accessing a lock
> and is just a tsc read. I assumed that local_clock() fast and lockless access
> was the "best" method for obtaining a time stamp.

printk() is hardly a hotpath function and local_clock() is not much faster
than ktime_get_mono_fast_ns(). It's lockless as well otherwise it would not be
NMI safe.

So what's the point?

> I would only suggest using the other clocks on systems that are "known
> stable", or running kernels that are considered to have stable timekeeping
> code.

I neither run production stuff on systems which are known to be unstable nor
on a kernel which does not have a stable timekeeping code.

Sorry, but your argumentation does not make any sense.

You can solve the whole business by changing the timestamp in printk_log to

u64 mono;
u64 offset_real;

and have a function which does:

u64 ktime_get_log_ts(u64 *offset_real)
{
*offset_real = tk_core.timekeeper.offs_real;

if (timekeeping_active)
return ktime_get_mono_fast_ns();
else
return local_clock();
}

That uses extra 8 bytes of data per log buffer entry, but that's really a non
issue. That already gives you the data for crash analysis. If you want to be
able to have the clock REAL time stamps in console/dmesg/syslog then you
simply can make printk_time an integer and act depending on the value:

printk_time=0 timestamps off
printk_time=1 timestamps clock MONO (default and backwards compatible)
printk_time=2 timestamps clock REAL

That's minimal intrusive and does the job without the need of a gazillion of
command line options and new functions which are just pointless bloat.

Note, that on 32bit systems accessing tk_core.timekeeper.offs_real is racy
vs. a concurrent update, but with your proposed solution it's not any
different.

Thanks,

tglx