Re: [RFC] perf: need to expose sched_clock to correlate user sampleswith kernel samples

From: John Stultz
Date: Tue Feb 19 2013 - 15:35:35 EST


On 02/19/2013 12:15 PM, Thomas Gleixner wrote:
On Tue, 19 Feb 2013, Thomas Gleixner wrote:
On Tue, 19 Feb 2013, John Stultz wrote:
Would be interesting to compare and contrast that. Though you can't do
that in the kernel as the write hold time of the timekeeper seq is way
larger than the gtod->seq write hold time. I have a patch series in
work which makes the timekeeper seq hold time almost as short as that
of gtod->seq.
As a side note. There is a really interesting corner case
vs. virtualization.

VCPU0 VCPU1

update_wall_time()
write_seqlock_irqsave(&tk->lock, flags);
....

Host schedules out VCPU0

Arbitrary delay

Host schedules in VCPU0
__vdso_clock_gettime()#1
update_vsyscall();
__vdso_clock_gettime()#2

Depending on the length of the delay which kept VCPU0 away from
executing and depending on the direction of the ntp update of the
timekeeping variables __vdso_clock_gettime()#2 can observe time going
backwards.

You can reproduce that by pinning VCPU0 to physical core 0 and VCPU1
to physical core 1. Now remove all load from physical core 1 except
VCPU1 and put massive load on physical core 0 and make sure that the
NTP adjustment lowers the mult factor.

Fun, isn't it ?

Yea, this has always worried me. I had a patch for this way way back, blocking vdso readers for the entire timekeeping update.
But it was ugly, hurt performance and no one seemed to be hitting the window you hit above. None the less, you're probably right, we should find a way to do it right. I'll try to revive those patches.

thanks
-john



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/