Re: gettimeofday non-monotonic on 2.2.7 SMP

dave madden (dhm@webvision.com)
Wed, 19 May 1999 21:54:09 -0700


=>From: Andrea Arcangeli <andrea@suse.de>
=>...
=>I should have fixed all gettimeofday SMP races in the late 2.1.x stage. I
=>also did quite testing also for get_fast_time retvail from irq handlers at
=>such time.
=>
=>The only thing that may cause gettimeofday to return not monotone values
=>can be a missed timer irq or otherwise not synchornized tsc with SMP
=>hardware.
=>
=>If you miss a timer irq handler then gettimeoffset will follow you with
=>right values. Then when the next irq handler will happens you'll get a
=>wrong value since gettimeofsset will restart from 0 and you missed an irq.
=>You should get delta of the order of 100msec though.
=>
=>To know if your problem is this one grab one of my andrea patches from:
=>
=> ftp://e-mind.com/pub/andrea/kernel/
=>
=>and tell me if you can reproduce (you'll find andrea-patches for both
=>2.2.x and 2.3.x kernel).

I built a new 2.3.3 kernel (Makefile still says 2.3.2...I guess
somebody forgot to update it?) with your patches. As you suspected:

May 19 21:33:46 vheissu kernel: recover_lost_timer: lost 1 tick from 080c1fb8
May 19 21:33:46 vheissu kernel: recover_lost_timer: lost 1 tick from 40218baf
May 19 21:33:46 vheissu kernel: recover_lost_timer: lost 1 tick from 40218ba8
May 19 21:33:46 vheissu kernel: recover_lost_timer: lost 1 tick from 401c8360
May 19 21:33:46 vheissu kernel: recover_lost_timer: lost 1 tick from 401c7fbe
May 19 21:33:46 vheissu kernel: recover_lost_timer: lost 1 tick from c0107997
May 19 21:33:46 vheissu kernel: recover_lost_timer: lost 1 tick from 401c82d6
May 19 21:33:46 vheissu kernel: recover_lost_timer: lost 1 tick from c0107997

c010795c T cpu_idle
c01079ac T sys_idle

I also got this "lost 429495 ticks" (!) from a previous kernel build,
but I don't have the System.map to go with it. I wish I knew what was
going on here; clearly, some signed calculation is returning a
negative number which is then being cast to unsigned. I think this is
the original problem I was having, with the clock jumping back & forth
4294 seconds (429495 ticks, right?)

May 19 21:24:09 vheissu kernel: recover_lost_timer: lost 429495 ticks from c0107997
May 19 21:24:09 vheissu kernel: recover_lost_timer: lost 2 ticks from c0107997
May 19 21:24:20 vheissu kernel: recover_lost_timer: lost 1 tick from c01f2ef7
May 19 21:24:20 vheissu kernel: recover_lost_timer: lost 1 tick from c0107997

=>If the problem is a lost tick over the time then my TSC code should tell
=>you also which is the piece of code that masked irqs on all cpus for a so
=>long time, so you can optimize it.

A lot of the ticks are lost from 0x40... is that userland? How can I
find out which process? (I suspect the X server: the easiest way to
lose ticks is to scroll a Netscape X window.) Can the server turn off
IRQs on both CPUs?

=>In my patch I implemented a recover_lost_ticks mechanizm that will detect
=>a lost timer interrupt and will update xtime to take care of the lost irq.
=>This will only work with TSC enabled, if you don't have the i386 TSC you
=>will continue to lose time over the time ;) and gettimeofday can't be
=>monotone in presence of a lost tick.

Is there any way to force gettimeofday not to return an earlier time?
Of course, I want the clock to be as accurate as possible, but the
time decrements are causing more trouble than a simple slow or fast
clock would. I've modified do_gettimeofday so that it maintains the
latest time it ever returned, and won't return a time earlier than
this. (The remembered time is updated in do_settimeofday so that
clock adjustments will still take effect.) This seems to help, but do
you see any problems with it?

regards,
d.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/