Re: 2.4.18 clock warps 4294 seconds

From: Per Gregers Bilse (bilse@qbfox.com)
Date: Fri Jul 26 2002 - 05:22:59 EST


On Jul 25, 10:34am, george anzinger <george@mvista.com> wrote:
> You have the number a bit low. If I recall, this is an 800

Yes, I figured that, and bumped it up a lot (to 1e9). Of course, since
setting the trap, things have been fine, including no loss of NTP synch.-/
Let's see over the weekend.

> The first thing I would check is that you are using DMA for
> you disc transfers. To the best of my knowledge, the

Yes, both machines and both disks use DMA, and also allow interrupts
("unmaskirq" option) during disk transfers, here's from hdparm(8):

/dev/hda:
 multcount = 16 (on)
 I/O support = 1 (32-bit)
 unmaskirq = 1 (on)
 using_dma = 1 (on)
 keepsettings = 0 (off)
 nowerr = 0 (off)
 readonly = 0 (off)
 readahead = 8 (on)
 geometry = 2434/255/63, sectors = 39102336, start = 0

The only slightly unusual thing is that both machines use soft RAID,
I don't know if that code might be doing something. But the problem
occurred at the same time as I made an application change (debug/logging)
that vastly -reduced- disk I/O.

Anyway, I've been looking through archived log files, and found a few
entries from the 2.4.7-10 kernel that looked interesting, here's a pair:

Feb 23 04:07:52 vulpes kernel: probable hardware bug: clock timer configuration lost - probably a VIA686a motherboard.
Feb 23 04:07:52 vulpes kernel: probable hardware bug: restoring chip configuration.

Both machines indeed have identical VIA686a motherboards. The messages
come from code in timer_interrupt() in time.c:

                /* read Pentium cycle counter */

                rdtscl(last_tsc_low);

                spin_lock(&i8253_lock);
                outb_p(0x00, 0x43); /* latch the count ASAP */

                count = inb_p(0x40); /* read the latched count */
                count |= inb(0x40) << 8;

                /* VIA686a test code... reset the latch if count > max */
                if (count > LATCH) {
                        static int last_whine;
                        outb_p(0x34, 0x43);
                        outb_p(LATCH & 0xff, 0x40);
                        outb(LATCH >> 8, 0x40);
                        count = LATCH - 1;
                        if(time_after(jiffies, last_whine))
                        {
                                printk(KERN_WARNING "probable hardware bug: clock timer configuration lost - probably a VIA686a motherboard.\n");
                                printk(KERN_WARNING "probable hardware bug: restoring chip configuration.\n");
                                last_whine = jiffies + HZ;
                        }
                }

                spin_unlock(&i8253_lock);

The "if (count > LATCH)" block has been taken out of the 2.4.18
kernel, while similar code is in do_slow_gettimeoffset() in both
the 2.4.7-10 and 2.4.18 kernels. I'm not sufficiently familiar
with the hardware and the code to know if this is significant,
but it does seem that there are some known hardware bugs which
the earlier kernel tried to address (but with limited or no success).

Anyway, let's see what happens over the weekend.

Thanks.

  -- Per

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Tue Jul 30 2002 - 14:00:23 EST