Re: [PATCH 6/7] x86/vdso: Add vDSO functions for user wait instructions

From: Andy Lutomirski
Date: Tue Jul 31 2018 - 17:39:24 EST


On Tue, Jul 31, 2018 at 2:22 PM, Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
> On Mon, 23 Jul 2018, Andy Lutomirski wrote:
>> On 07/23/2018 05:55 AM, Fenghua Yu wrote:
>> > static void __init init_vdso_funcs_data(void)
>> > {
>> > + struct system_counterval_t sys_counterval;
>> > +
>> > if (static_cpu_has(X86_FEATURE_MOVDIRI))
>> > vdso_funcs_data.movdiri_supported = true;
>> > if (static_cpu_has(X86_FEATURE_MOVDIR64B))
>> > vdso_funcs_data.movdir64b_supported = true;
>> > + if (static_cpu_has(X86_FEATURE_WAITPKG))
>> > + vdso_funcs_data.waitpkg_supported = true;
>> > + if (static_cpu_has(X86_FEATURE_TSC_KNOWN_FREQ)) {
>> > + vdso_funcs_data.tsc_known_freq = true;
>> > + sys_counterval = convert_art_ns_to_tsc(1);
>> > + vdso_funcs_data.tsc_per_nsec = sys_counterval.cycles;
>> > + }
>>
>> You're losing a ton of precision here. You might even be losing *all* of the
>> precision and malfunctioning rather badly.
>
> Indeed.
>
>> The correct way to do this is:
>>
>> tsc_counts = ns * mul >> shift;
>
>> and the vclock code illustrates it.
>
> Albeit you cannot use the TSC mult/shift pair as that is for the TSC to
> nsec conversion.
>
> To get the proper mult/shift pair use clocks_calc_mult_shift(). Note that
> the scaled math has an upper limit when using 64 bit variables. You might
> need 128bit scaled math to make it work correctly.
>
>> convert_art_ns_to_tsc() is a bad example because it uses an expensive
>> division operation for no good reason except that no one bothered to
>> optimize it.
>
> Right. It's not a hot path function and it does the job and we would need
> 128bit scaled math to avoid mult overflows.
>
> Aside of that I have no idea why anyone would use convert_art_ns_to_tsc()
> for anything else than converting art to nsecs.
>
>> > +notrace int __vdso_umwait(int state, unsigned long nsec)
>>
>> __vdso_umwait_relative(), please. Because some day (possibly soon) someone
>> will want __vdso_umwait_absolute() and its friend __vdso_read_art_ns() so they
>> can do:
>>
>> u64 start = __vdso_read_art_ns();
>
> Errm. No. You can't read ART. ART is only used by decives to which it is
> distributed. You can only read TSC here and convert that to nsecs.

Bah.

But my point remains -- I think that the user (non-vDSO) code should
think in nanoseconds, not TSC ticks. That we have have a much better
chance of getting migration right.

>
>> __vdso_umonitor(...);
>> ... do something potentially slow or that might fault ...
>> __vdso_umwait_absolute(start + timeout);
>
> That definitely requires 128bit scaled math to work correctly, unless you
> make the timeout relative before conversion.
>
> But I really think we should avoid creating yet another interface to
> retrieve TSC time in nsecs. We have enough of these things already.
>
> Ideally we'd use CLOCK_MONOTONIC here, but that needs more thought as:
>
> 1) TSC might be disabled as the timekeeping clocksource
>
> 2) The mult/shift pair for converting to nanoseconds is affected by
> NTP/PTP so it can be different from the initial mult/shift pair for
> converting nanoseconds to TSC.
>
> A possible solution would be to use CLOCK_MOTONIC_RAW which is not affected
> by NTP/PTP adjustments. But that still has the issue of TSC not being the
> timekeeping clocksource. Bah, the whole TSC deadline mode sucks. I have no
> idea what's wrong with simple down counters. They Just Work.


I think it's not totally crazy to declare UMWAIT on a system with a
non-TSC clocksource to be unsupported.