Re: FYI: bezerko mouse is back

Joe (josepha48@yahoo.com)
Fri, 30 Jul 1999 13:43:06 -0700 (PDT)


please see the bottom of this message for the origanal message..

I have moved this over to the kernel mailing list as well
because if this interrupt bug is as you say it is then it may
affect non smp machines in a different way then SMP machines and
may cause problems elsewhere...
>
> I had (still have, but very occasionally) a similar problem
> with 2.3.3
> and some patches on an Intel N440BX (2xPIII,256MB) system.
> I've tried
> 2.2.x kernels as well, and without my patch, I have *lots* of
> mouse
> problems.
>

again my system was doing fine for about a month then last
night it came back.. I had used a patch that allowed me to
change the APIC triggering from APIC-edge to APIC-level, and
things were okay for a month or so..

> I tracked it down to problems with the kernel's service of
> gettimeofday() calls: the routines in
> linux/arch/i386/kernel/time.c do

okay was this code between #ifdef __SMP__ or not? and how did
you track this down? if this was not between #ifdef's then
people on non SMP machines should have clock skews and interrupt
misses also..

> calculations to detect missed interrupts and (if your
> processor has a
> TSC) to improve time resolution. What would happen is that
> the missed
> interrupts calculation and/or the TSC-elapsed-time calculation
> would
> return a "negative" number which, since it was treated as
> unsigned,
> would end up advancing the clock a lot (usually about
> 4294967000uS, or

hmm that may explain why my clock is off, however it takes a
while before it skews an hour...

> a little more than an hour). Then, on the next call to
> gettimeofday(), the clock would bounce back to the correct
> time.
> However, the damage was done: on the forward bounce, the X
> screensaver
> would kick in, and on the backwards bounce, the mouse pointer
> calculations would get screwed up, resulting in erratic
> pointer
> behaviour, random bogus clicks, and so on.
>
> I made a patch that prevents gettimeofday() from returning
> times
> earlier than it returned previously, and also flags when the
> missed
> interrupts or TSC calculations return a large forward
> adjustment; this
> has solved most of my problems, but I still occasionally get
> wild time
> warps where the clock starts racing forward. I haven't been
> able to
> track this down, because once it starts, cron jobs and the
> like get
> started up (logrotate, find, makewhatis, etc) and the system
> dies a
> quick, messy death.
>
> Andrea Archangeli also made some patches that were designed to
> eliminate missed interrupts (and detect them, too), but I
> still get
> log messages about "lost ticks."
>
> Anyway, you can detect time warps pretty easily with a program
> that
> just calls gettimeofday() in a tight loop. I'll append the
> one I used
> when I was actively working on this problem. I've pretty much
> given
> up, though; Andrea's & my patches makes my machine mostly
> usable, and
> I figure eventually one of the non-devel kernels will solve
> all the
> problems the Right Way.
>
> d.
>
> * * * * *
>
> #include <stdio.h>
> #include <sys/types.h>
> #include <time.h>
> #include <sys/time.h>
> #include <unistd.h>
>
> inline long
> timeval_diff( const struct timeval *a, const struct timeval *b
> )
> {
> if (a->tv_sec < b->tv_sec ||
> (a->tv_sec == b->tv_sec && a->tv_usec < b->tv_usec))
> return -((b->tv_sec - a->tv_sec) * 1000000 +
> (b->tv_usec-a->tv_usec));
> else
> return (a->tv_sec - b->tv_sec) * 1000000 + (a->tv_usec -
> b->tv_usec);
> }
>
> int
> main( int argc, char **argv )
> {
> struct timeval now;
> struct timeval then;
> unsigned long n_fwd = 0;
> long max_fwd = 0;
> unsigned long n_back = 0;
> long max_back = 0;
> time_t last_log = time( 0 );
>
> gettimeofday( &then, 0 );
>
> for ( ; ; ) {
> long delta;
>
> gettimeofday( &now, 0 );
>
> delta = timeval_diff( &now, &then );
>
> if (delta >= 0)
> ++n_fwd;
> else
> ++n_back;
>
> if (delta > max_fwd)
> max_fwd = delta;
> else
> if (delta < max_back)
> max_back = delta;
>
> if (now.tv_sec >= last_log + 1) {
> printf( "%.19s FWD:%6ld/%6ld:BACK (%4ldus > delta >
> %4ldus)\n",
> ctime( &now.tv_sec ),
> n_fwd, n_back, max_back, max_fwd );
> n_fwd = n_back = 0;
> max_fwd = max_back = 0;
> last_log = now.tv_sec;
> }
>
> then = now;
> }
>
> return 0;
> }
>
>
*******************original message *************************
it came back last night, while using X my mouse suddenly went
bezerko again.

<breif history>
SMP machine DUAL 233 tyan tomcat IV (buggy
BIOS)
2.2.9 or later kernels
there have been problems with the APIC's on
numerous machines

a patch was sent out to change the behavior of
the interrupt
from APIC-level to APIC-edge or XT-PIC or from
anyone to another
(thanks Jos)... I used this patch to change
the behavior from
edge to level, and for a while this was
working, but last night
it came back.

I was writing some code in nedit, and had
gnome-terminal open,
and runing fvwm95 as the window manager, with
kfm and xteddy..
</end history>

I'll be switiching the APIC-level to XT-PIC as
I know that will
work. It did with 2.0 kernels and I tried it
once before and
seemed to work, but if it doesn't I'll let you
all know. Also
level seemed to be working just fine for a
while now, probably
about a month

<Good news>
I was using a program I wrote (mpstat self
plug here) to log
some information, (cpu %, interrupts, maj/min
faults) and have
that log file saved with the date in the
filename <bad news>but
not on this machine. Although this program
does log interrupts,
they are not seperate, they are summed up, so
if there is a
sudden increase it would be difficult to tell
where it was but
it is possible to guess.</bad news> Also in
viewing this there
was no increase in interrupts that I saw, but
I only did a quick
glance. (If you want this email me direcly
please and I'll send
it tonight)
</good news>

This bug effects SMP machines (APIC bug) and
causes bezerko
mouse behavior, and can also cause NFS
problems (not sure what
thou)

I know that work is being done on this to fix
the bug (not sure
of how that is going).

I have been given a temporary solution (the
patch -> change to
XT-PIC from APIC). but this means that all my
interrrupt for
that IRQ will be on the boot cpu. :-(

The original problem was believed to be in the
APIC-edge code.
But since I have been using the patch, and had
set it to
APIC-level this may no longer be true. The way
I see it there
are one of two possible problems then:
1) my mouse needs to be edge triggered but
then why did it
work for a month as lkevel triggered oh so
well?
2) the bug is not in the APIC-edge code,
but somewhere else
in the source code.

This is an illusive bug, and very hard to
reproduce and occurs
randomly. Just thought an update was
necessary.

Joe
_____________________________________________________________
Do You Yahoo!?
Free instant messaging and more at http://messenger.yahoo.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/