Is this behavior in clockevents_program_event() correct?

From: Jason Zhang
Date: Mon Mar 13 2017 - 10:35:46 EST


Hi, timer experts,

I have a question on the above function, and would appreciate your thoughts.

In debugging a kernel hrtimer stalling issue on RHEL7.x kernel, we
found a case when this function was called from hrtimer_interrupt(),
value of expires was before ktime_get(), not sure how that got there,
could it be ktime was running behind?. Anyways, that leaves a negative
delta, and since force was not set, clockevents_program_event()
skipped programming the unerlying device at (2), but next_event was
set to expires at (1).

Other part of kernel may make decisions based on next_event, e.g. in
3.10 kernel, tick_nohz_stop_sched_tick() compares next_event with
next_tick, and will skip programming the device if they match. And
that caused the hrtimer to stall on the CPU.

I have not gone through all the places that reference next_event to
know if it can be an issue else where or in the latest kernel, but
this side effect does not look right. So I want to raise the question
here:

Should clock_event_device->next_event be kept consistent with what's
programmed in the underlying device?

Thanks
-Jason

kernel/time/clockevents.c:
int clockevents_program_event(struct clock_event_device *dev, ktime_t expires,
bool force)
{
unsigned long long clc;
int64_t delta;
int rc;

if (unlikely(expires < 0)) {
WARN_ON_ONCE(1);
return -ETIME;
}

dev->next_event = expires; <------ (1)

if (clockevent_state_shutdown(dev))
return 0;

/* We must be in ONESHOT state here */
WARN_ONCE(!clockevent_state_oneshot(dev), "Current state: %d\n",
clockevent_get_state(dev));

/* Shortcut for clockevent devices that can deal with ktime. */
if (dev->features & CLOCK_EVT_FEAT_KTIME)
return dev->set_next_ktime(expires, dev);

delta = ktime_to_ns(ktime_sub(expires, ktime_get()));
if (delta <= 0)
return force ? clockevents_program_min_delta(dev) :
-ETIME; <--- (2)

delta = min(delta, (int64_t) dev->max_delta_ns);
delta = max(delta, (int64_t) dev->min_delta_ns);

clc = ((unsigned long long) delta * dev->mult) >> dev->shift;
rc = dev->set_next_event((unsigned long) clc, dev);

return (rc && force) ? clockevents_program_min_delta(dev) : rc;
}