Re: 40ms/10ms error in do_gettimeofday()

From: Boris Okun (okun@math.Vanderbilt.Edu)
Date: Wed Apr 05 2000 - 12:16:56 EST


Bernard Imbert wrote:
>
> May be this may help some gettimeofday() guru:
> when I have these time problems, one of the two
> following is true, in arch/i386/kernel/time.c,
> function do_poor_nanotime():
> (i) either
> (jiffies == jiffies_t) and (count > count_p)
> (but I never detect any pending timer interrupt)
>
> (ii) or
> (jiffies != jiffies_t) and (count >= 11920)
> (this 11920 is empiric....)
>
> Hope this helps someone to help me!
> Bernard

This sounds like a problem described and solved by Jason Sodergren
recently.
Do you use some unofficial patches? AFAIK, there is no
do_poor_nanotime() in official kernels.

Boris

Here is Jason Sodergren's message:

Hello, everyone.

I've run into a problem with the do_slow_gettimeoffset function in kernel
2.2.14
(the code apparently hasn't changed much in newer kernels).

During calls to do_gettimeofday(), the time returned is occasionally 10mS
behind
where it should be. I've narrowed this down to what appears to be a
problem
with timer underflow detection in do_slow_gettimeoffset.

Here is a copy of that function with whitespace/commments/neptune bug
stuff
stripped out for the sake of brevity:

>From arch/i386/kernel/time.c:
static unsigned long do_slow_gettimeoffset(void)
{
        int count;
        static int count_p = LATCH; /* for the first call after boot */
        static unsigned long jiffies_p = 0;
        unsigned long jiffies_t;
        /* timer count may underflow right here */
        outb_p(0x00, 0x43); /* latch the count ASAP */
        count = inb_p(0x40); /* read the latched count */
         jiffies_t = jiffies;
        count |= inb_p(0x40) << 8;
(1) if( jiffies_t == jiffies_p ) {
                if( count > count_p ) {
                        outb_p(0x0A, 0x20);
                        if( inb(0x20) & 0x01 ) {
(2) count -= LATCH;
                        } else {
                                printk("do_slow_gettimeoffset(): hardware
timer problem?\n");
                        }
                }
        } else
                jiffies_p = jiffies_t;
        count_p = count;
        count = ((LATCH-1) - count) * TICK_SIZE;
        count = (count + LATCH/2) / LATCH;
        return count;
}

The problem seems to be that underflow detection will not necessarily work
the first time
this function is called while interrupts are disabled. For example, in
this sequence
of events:

- device driver interrupt occurs, ISR is entered with interrupts disabled.
- timer underflow occurs; irq0 is now pending
- device driver ISR calls do_gettimeofday(), which calls
do_slow_gettimeoffset()

In the above case, the condition at (1) will be false if this is the
first call to the function during the current jiffy, since the current
jiffies value is different from the stored value from last call of the
function.
Therefore, count is not compensated for timer underflow and time seems to
jump
backwards 10mS.

In subsequent calls with interrupts still disabled, the check at (1) would
return true, and the correct underflow compensation would occur.

I've modified the above function as follows; this seems to correct
the problem on my test machines:

static unsigned long do_slow_gettimeoffset(void)
{
        int count;
        unsigned char irqpend;
        /* timer count may underflow right here */
        outb(0x0A, 0x20);
        outb(0x00, 0x43); /* latch the count ASAP */
(1) irqpend=inb(0x20); /* get IRQ0 status as close to */
                                /* count latch time as possible */
        /* Slight chance that IRQ0 was set AFTER count was latched. */
         
        count = inb_p(0x40); /* read the latched count */
        count |= inb_p(0x40) << 8;

        if( irqpend & 0x01 ) /* Counter underflow? */
        {
        /* If count is small and IRQ is pending, IRQ was most likely
           set AFTER count was latched, or an IRQ0 was lost*/
            
(2) if(count>10) /* 10 is arbitrary */
                        count -= LATCH;
        }
        count = ((LATCH-1) - count) * TICK_SIZE;
        count = (count + LATCH/2) / LATCH;
        return count;
}

Instead of checking count and jiffies against values stored during the
previous
call to the function, I'm just checking for a pending IRQ0, which I try to
check as close to the latching of count as possible. There's still the
possibility
that underflow occurs right after latching, resulting in erroneous
detection of
underflow; that's what the check of the latched count value at (2) tries
to address. This code seems to fix the time jump problem I've observed
when using
the original code.

It seems to me the original code is flawed. Am I missing something?
Any input is appreciated. If this IS a flaw, I'll work on the function
a bit and produce a patch.

- Jason Sodergren - jason@taiga.com - http://www.taiga.com/~jason -
          - PGPK @ http://www/taiga.com/~jason/pgp.phtml -

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Fri Apr 07 2000 - 21:00:15 EST