Re: [PATCH 1/3] timekeeping: NMI safe converter from a given time to monotonic

From: John Stultz
Date: Tue Jan 24 2023 - 13:43:45 EST


On Tue, Jan 24, 2023 at 7:09 AM Liang, Kan <kan.liang@xxxxxxxxxxxxxxx> wrote:
> On 2023-01-24 2:01 a.m., John Stultz wrote:
> > On Mon, Jan 23, 2023 at 10:27 AM <kan.liang@xxxxxxxxxxxxxxx> wrote:
> >> + /*
> >> + * Check whether the given timestamp is on the current
> >> + * timekeeping interval.
> >> + */
> >> + now = tk_clock_read(tkr);
> >> + interval_start = tkr->cycle_last;
> >> + if (!cycle_between(interval_start, cycles, now))
> >> + return -EOPNOTSUPP;
> >
> > So. I've not fully thought this out, but it seems like it would be
> > quite likely that you'd run into the case where the cycle_last value
> > is updated and your earlier TSC timestamp isn't valid for the current
> > interval. The get_device_system_crosststamp() logic has a big chunk of
> > complex code to try to handle this case by interpolating the cycle
> > value back in time. How well does just failing in this case work out?
> >
>
> For the case, perf fallback to the time captured in the NMI handler, via
> ktime_get_mono_fast_ns().

This feels like *very* subtle behavior. Maybe I'm misunderstanding,
but the goal seems to be to have more accurate timestamps on the hw
events, and using the captured tsc timestamp avoids the measuring
latency reading the time again. But if every timekeeping update
interval (~tick) you transparently get a delayed value due to the
fallback, it makes it hard to understand which timestamps are better
or worse. The latency between two reads may be real or it may be just
bad luck. This doesn't intuitively seem like a great benefit over more
consistent latency of just using the ktime_get_mono_fast()
timestamping.

> The TSC in PEBS is captured by HW when the sample was generated. There
> should be a small delta compared with the time captured in the NMI
> handler. But I think the delta should be acceptable as a backup solution
> for the most analysis cases. Also, I don't think the case (the
> cycle_last value is updated during the monitoring) should occur very
> often either. So I drop the history support to simplify the function.

So the reads and this function are *always* used in NMI context? Has
this been stressed with things like SMIs to see how it does if
interrupted in those cases?

My worry is that (as I bored everyone earlier), the
ktime_get_*_fast_ns() interfaces already have some sharp edges and
need a fair amount of thought as to when they should be used. This is
sort of compounding that adding an interface that has further special
cases where it can fail, making it difficult to fully understand and
easier to accidentally misuse.

My other concern is that interfaces always get stretched and used
beyond anything they were initially planned for (see the
ktime_get_*fast_ns() interfaces here as an example! :), and in this
case the logic seems to have lots of implicit dependencies on the
facts of your specific use case, so it seems a bit fragile should
folks on other architectures with other constraints try to use it.

So I just want to push a bit to think how you might be able to
extend/generalize the existing get_system_crosststamp for your
purposes, or alternatively find a way to simplify the logic's behavior
so its less tied to specific constraints ("this works most of the time
from NMI, but otherwise no promises"). Or at least some better
documentation around the code, its uses and its constraints? ( "NMI
safe" is not the same as "Only safe to use from NMI" :)

And apologies if I've misunderstood things here.

thanks
-john