Matt
At 01:31 15/12/97 +0100, Karsten Weiss wrote:
>Hi kernel hackers!
>
>First of all my setup:
>
>Genuine Intel 486DX4/100
>ASUS SP3G
>SoundBlaster 16
>NE2000 (ISA)
>
>Main memory size: 48 Mbytes
>1 GenuineIntel 486 processor
>2 16550A serial ports
>1 vga+ graphics device
>1 keyboard
>SCSI devices:
> IBM OEM 0662S12
> IBM DPES-31080
> SONY CD-ROM CDU-8003A
> HP HP35480A
>PCI bus devices:
> VGA compatible device: S3 Inc. Vision 864-P (rev 0).
> Non-VGA device: Intel 82378IB (rev 3).
> Non-VGA device: NCR 53c810 (rev 1).
> Non-VGA device: Intel 82424ZX Saturn (rev 4).
>
>I'm currently using linux 2.0.33p3 (compiled with gcc-2.7.2.1),
>libc 5.4.38 and XFree86-S3-3.3.1 on a RedHat 4.2 system (all update
>PMs applied). The machine has been rock-solid for *YEARS* now and
>I'm using it several hours each day (see my comment about the RAM
>configuration change at the end of this mail).
>
>In the past few weeks, however, I had two full freezes in X
>(using either 2.0.31 or a 2.0.31 prepatch - I can't remember exactly).
>The freeze NEVER occured with 2.0.32. Today, though, it happened for a
>third time with 2.0.33p3. With "freeze" I mean a complete lock-up.
>The system doesn't even reply pings from my brother's computer. There
>was no OOPS and no syslog entry. The only pattern I can see is that I
>always had several Netscapes (3.10) running when the freeze happened.
>Today it happened for the third time right after a configure run of
>the latest gtk+-0.99.0 was finished (and using Netscape).
>
>Right after the third freeze I pressed the reset button. After rebooting
>I restarted the gtk+ configure run. This time I was working in the
>console and guess what: The system freeze happened again - for the first
>time in the console! Nothing else was running at this time. I don't
>know if this is the same kind of freeze I had before but anyway here's
>what I got:
>
>checking whether build environment is sane... segment not present: 0103
>CPU: 0
>EIP: 0010:[<0010974c>]
>EFLAGS: 00010246
>eax: 00000002 ebx: 00008220 ecx: fffffc18 edx: 001b1f5c
>esi: 001b1784 edi: 00000000 ebp: 00009000 esp: 001b1738
>ds: 0018 es: 0018 fs: 002b gs: 0018 ss: 0018
>Process swapper (pid: 0, process nr: 0, stackpage=001af7a8)
>Stack: 001b1f5c 0010a845 00000100 00109410 0000001f 001b1784 00000000
00009000
> ffffffda 00000018 00000018 00100018 00190018 00000070 001090b7
00000010
> 00000246 0010927d 00000000 7f6e6547 0009e200 00101ffe 00000000
001aeea8
>CallTrace: [<0010a845>] [<00109410>] [<00190018>] [<0010927d>]
>Code: 83 3d 94 f7 1a 00 00 74 02 31 db e8 24 88 00 00 eb aa 89 f6
>kfree of non-kmalloced memory: 001b17f0, next= 00000000, order=0
>kfree of non-kmalloced memory: 001b17e0, next= 00000000, order=0
>kfree of non-kmalloced memory: 001b1cf4, next= 00000000, order=0
>idle task may not sleep
>idle task may not sleep
>idle task may not sleep
>idle task may not sleep
>idle task may not sleep
>
>(I wrote this on a piece of paper and hope that all numbers are correct!)
>
>001096e0 <sys_idle>:
> 1096e0: 53 pushl %ebx
> 1096e1: 31 db xorl %ebx,%ebx
> 1096e3: a1 98 27 1d 00 movl 0x1d2798,%eax
> 1096e8: 83 78 6c 00 cmpl $0x0,0x6c(%eax)
> 1096ec: 74 12 je 109700 <sys_idle+20>
> 1096ee: b8 ff ff ff ff movl $0xffffffff,%eax
> 1096f3: 5b popl %ebx
> 1096f4: c3 ret
> 1096f5: 8d 74 26 00 leal 0x0(%esi,1),%esi
> 1096f9: 8d bc 27 00 00 leal 0x0(%edi,1),%edi
> 1096fe: 00 00
> 109700: c7 40 04 9c ff movl $0xffffff9c,0x4(%eax)
> 109705: ff ff
> 109707: 90 nop
> 109708: 85 db testl %ebx,%ebx
> 10970a: 75 06 jne 109712 <sys_idle+32>
> 10970c: 8b 1d 40 23 1b movl 0x1b2340,%ebx
> 109711: 00
> 109712: a1 40 23 1b 00 movl 0x1b2340,%eax
> 109717: 29 d8 subl %ebx,%eax
> 109719: 83 f8 21 cmpl $0x21,%eax
> 10971c: 76 12 jbe 109730 <sys_idle+50>
> 10971e: e8 7d ff ff ff call 1096a0 <hard_idle>
> 109723: eb 27 jmp 10974c <sys_idle+6c>
> 109725: 8d 74 26 00 leal 0x0(%esi,1),%esi
> 109729: 8d bc 27 00 00 leal 0x0(%edi,1),%edi
> 10972e: 00 00
> 109730: 80 3d a3 ee 1a cmpb $0x0,0x1aeea3
> 109735: 00 00
> 109737: 74 13 je 10974c <sys_idle+6c>
> 109739: 83 3d c0 e7 1a cmpl $0x0,0x1ae7c0
> 10973e: 00 00
> 109740: 75 0a jne 10974c <sys_idle+6c>
> 109742: 83 3d 94 f7 1a cmpl $0x0,0x1af794
> 109747: 00 00
> 109749: 75 0a jne 109755 <sys_idle+75>
> 10974b: f4 hlt
> 10974c: 83 3d 94 f7 1a cmpl $0x0,0x1af794
>^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 109751: 00 00
> 109753: 74 02 je 109757 <sys_idle+77>
> 109755: 31 db xorl %ebx,%ebx
> 109757: e8 24 88 00 00 call 111f80 <schedule>
> 10975c: eb aa jmp 109708 <sys_idle+28>
> 10975e: 89 f6 movl %esi,%esi
>
>Here's the kernel source of the asm code:
>
>asmlinkage int sys_idle(void)
>{
> unsigned long start_idle = 0;
>
> if (current->pid != 0)
> return -EPERM;
> /* endless idle loop with no priority at all */
> current->counter = -100;
> for (;;)
> {
> /*
> * We are locked at this point. So we can safely call
> * the APM bios knowing only one CPU at a time will do
> * so.
> */
> if (!start_idle)
> start_idle = jiffies;
> if (jiffies - start_idle > HARD_IDLE_TIMEOUT)
> {
> hard_idle();
> }
> else
> {
> if (hlt_works_ok && !hlt_counter && !need_resched)
> __asm__("hlt");
> }
>!!!!!!!!!-> if (need_resched)
> start_idle = 0;
> schedule();
> }
>}
>
>These are the functions of the CallTrace:
>
>CallTrace: [<0010a845>] [<00109410>] [<00190018>] [<0010927d>]
>
>0010a845: system_call+0x55 (system_call = 0010a7f0)
>00109410: init
>00190018: calc_vol+0x68 (calc_vol = 0018ffb0)
>0010927d: start_kernel+0x1ad (start_kernel = 001090d0)
>
>
>I upgraded from 24 to 48 MB some time ago *BEFORE* the freezes happened
>for the first time. Could bad SIMMs be the cause of this problem?
>Actually, I fear this is the case as there doesn't seem to be an
>obvious bug in the above code - at least not at the EIP address.
>But why are there "kfree of non-kmalloced memory" messages?
>
>Another observation: I just noticed that there are three remaining
>files in /tmp from the configure run just before the freeze (I don't
>know if it's from the first or the second configure run):
>
>-rw-r--r-- 1 root root 208 Dec 14 22:36 cca04047.i
>-rw-r--r-- 1 root root 1728 Dec 14 22:36 cca04047.s
>-rw-r--r-- 1 root root 2108 Dec 14 22:36 cca040471.o
>
>The funny thing is that those files don't contain any code but parts
>of e-mails and news postings that I've read before the freeze!
>
>Could this be an indication of buffer cache trashing? Or is this
>just the result of written meta data and not written data?
>
>If you need more information feel free to mail me!
>
>Good night,
>
>Karsten Weiss UUCP: karsten@addx.au.s.shuttle.de
>>ASK FOR PGP KEY< INTERNET: knweiss@trick.informatik.uni-stuttgart.de
>
>
>
>