Re: One more boobytrap needed for 2.2.15pre ?

From: Andris Pavenis (andris@stargate.astr.lu.lv)
Date: Wed Feb 16 2000 - 03:31:54 EST


On Tue, 15 Feb 2000, Manfred Spraul wrote:
> [cc'ed back to lk]
> I have one _idea_ what could have happened:
>
> 1) thread is running with TASK_INTERRUPTIBLE
> 2) calls schedule()
> 3) schedule calls del_from_runqueue()
> 4) ...
> 5) can't find a new task, jumps to recalculate
> 6) enables interrupts
> 7) interrupt
> 8) XXXXXXXXXXXXXX: someone changes current->state from within the interrupt.
> current->state now TASK_RUNNING
> 9) interrupt returns
> 10) recalculate finishes, jumps to repeat_schedule().
> 11) prev->state == TASK_RUNNING, so current is selected
> 12) a task that's not on the runqueue is selected!!!!!!!!!
>
> Ok, but now the question is: which interrupt changes current->state? Perhaps
> a special boobytrap in the interrupt handlers could detect that?
>
> in arch/i386/kernel/irq.c:
> * store "current->state" in a special local variable.
> * always change it to TASK_INTERRUPTIBLE.
> * before returning: check if someone changed current->state
> * restore the old value.

I rebuilt kernel with patch to get stack trace on an attempt to remove not
running task from runqueue once more some hours later in night. So I'm
including stack trace from ksymoops together with not editted output of dmesg
at end of message (affected application is kwm again)

As far as I found from earlier experience, I'm getting this problem with
recent 2.2... kernels built with all compilers I have tried (gcc-2.7.2.3,
egcs-1.1.2, gcc-2.95.2) in an identical way, so I don't think that gcc-2.95.2 I
used is guilty here (some time ago I specially tested building kernel with
various compilers, but didn't found any difference, at least for this problem)

If this is caused by some interrupt handler which messes up state of
current task then I should add some guesses what stuff could by suspected:
        More likely it's a driver of some not very common hardware (otherwise
        there would be much more error reports from other poeple). So only
        hadrware piece that commes into mind is Crystal CS4232 based sound-card
        (drivers/sound/cs4232.c).

Andris

ksymoops 2.3.3 on i586 2.2.15pre7. Options used
     -V (default)
     -k /proc/ksyms (default)
     -l /proc/modules (default)
     -o /lib/modules/2.2.15pre7/ (default)
     -m /usr/src/linux/System.map (default)

Warning: You did not tell me where to find symbol information. I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc. ksymoops -h explains the options.

Warning (compare_ksyms_lsmod): module bsd_comp is in lsmod but not in ksyms, probably no symbols exported
Warning (compare_ksyms_lsmod): module cs4232 is in lsmod but not in ksyms, probably no symbols exported
Warning (compare_ksyms_lsmod): module smbfs is in lsmod but not in ksyms, probably no symbols exported
c10d9ea4 c22e0000 c0a10000 000000b7 c10d9ef8 001b8d0e 00000010 c010ff54
       c10d9ef8 c020bc00 c10d9f0c c010ff4b c10d9ef8 c0a3aa2c 00000000 c012d4c7
       00000000 00000000 c3317d98 00000000 00000004 00000000 c01e389c 001b8d0e
Call Trace: [<c010ff54>] [<c010ff4b>] [<c012d4c7>] [<c010fb98>] [<c012d874>] [<c012dc2d>] [<c01536af>]
       [<c0117200>] [<c01098b8>] [<c01088c4>]
Warning (Oops_read): Code line not seen, dumping what data is available

Trace; c010ff54 <schedule_timeout+78/94>
Trace; c010ff4b <schedule_timeout+6f/94>
Trace; c012d4c7 <alloc_wait+17/98>
Trace; c010fb98 <process_timeout+0/14>
Trace; c012d874 <do_select+1b8/200>
Trace; c012dc2d <sys_select+371/498>
Trace; c01536af <sock_ioctl+23/2c>
Trace; c0117200 <sys_gettimeofday+20/94>
Trace; c01098b8 <common_interrupt+18/20>
Trace; c01088c4 <system_call+34/38>

5 warnings issued. Results may not be reliable.

-----------------------------------------------------------------------------
Linux version 2.2.15pre7 (root@hal) (gcc version 2.95.2 19991024 (release)) #3 Tue Feb 15 17:58:20 EET 2000
Detected 200456095 Hz processor.
Console: colour VGA+ 80x25
Calibrating delay loop... 399.77 BogoMIPS
Memory: 63316k/65536k available (892k kernel code, 412k reserved, 852k data, 64k init)
Dentry hash table entries: 8192 (order 4, 64k)
Buffer cache hash table entries: 65536 (order 6, 256k)
Page cache hash table entries: 16384 (order 4, 64k)
CPU: Intel Pentium MMX stepping 03
Checking 386/387 coupling... OK, FPU using exception 16 error reporting.
Checking 'hlt' instruction... OK.
Intel Pentium with F0 0F bug - workaround enabled.
POSIX conformance testing by UNIFIX
PCI: PCI BIOS revision 2.10 entry at 0xfb050
PCI: Using configuration type 1
PCI: Probing PCI hardware
Linux NET4.0 for Linux 2.2
Based upon Swansea University Computer Society NET3.039
NET4: Unix domain sockets 1.0 for Linux NET4.0.
NET4: Linux TCP/IP 1.0 for NET4.0
IP Protocols: ICMP, UDP, TCP
TCP: Hash tables configured (ehash 65536 bhash 65536)
Starting kswapd v 1.5
parport0: PC-style at 0x378 [SPP,PS2,EPP]
Serial driver version 4.27 with no serial options enabled
ttyS00 at 0x03f8 (irq = 4) is a 16550A
ttyS01 at 0x02f8 (irq = 3) is a 16550A
pty: 256 Unix98 ptys configured
lp0: using parport0 (polling).
apm: BIOS version 1.2 Flags 0x07 (Driver version 1.9)
PIIX4: IDE controller on PCI bus 00 dev 39
PIIX4: not 100% native mode: will probe irqs later
    ide0: BM-DMA at 0xf000-0xf007, BIOS settings: hda:pio, hdb:pio
    ide1: BM-DMA at 0xf008-0xf00f, BIOS settings: hdc:pio, hdd:pio
hda: FUJITSU MPE3064AT, ATA DISK drive
hdb: FUJITSU MPB3043ATU E, ATA DISK drive
hdc: 8X, ATAPI CDROM drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
ide1 at 0x170-0x177,0x376 on irq 15
hda: FUJITSU MPE3064AT, 6187MB w/512kB Cache, CHS=788/255/63
hdb: FUJITSU MPB3043ATU E, 4125MB w/0kB Cache, CHS=525/255/63, UDMA
hdc: ATAPI 7X CD-ROM drive, 120kB Cache
Uniform CDROM driver Revision: 2.56
Floppy drive(s): fd0 is 1.44M
FDC 0 is a post-1991 82077
scsi : 0 hosts.
scsi : detected total.
ne2k-pci.c:vpre-1.00e 5/27/99 D. Becker/P. Gortmaker http://cesdis.gsfc.nasa.gov/linux/drivers/ne2k-pci.html
ne2k-pci.c: PCI NE2000 clone 'RealTek RTL-8029' at I/O 0x6500, IRQ 10.
eth0: RealTek RTL-8029 found at 0x6500, IRQ 10, 00:C0:DF:E6:B3:F1.
Partition check:
 hda: hda1 hda2 hda3
 hdb: hdb1 hdb2 < hdb5 > hdb3
VFS: Mounted root (ext2 filesystem) readonly.
Freeing unused kernel memory: 64k freed
Adding Swap: 136544k swap-space (priority -1)
MSDOS FS: Using codepage 775
MSDOS FS: IO charset iso8859-4
MSDOS FS: Using codepage 775
MSDOS FS: IO charset iso8859-4
MSDOS FS: Using codepage 775
MSDOS FS: IO charset iso8859-4
CSLIP: code copyright 1989 Regents of the University of California
SLIP: version 0.8.4-NET3.019-NEWTTY-MODULAR (dynamic channels, max=256) (6 bit encapsulation enabled).
SLIP linefill/keepalive option.
PPP: version 2.3.7 (demand dialling)
PPP line discipline registered.
PPP BSD Compression module registered
ad1848/cs4248 codec driver Copyright (C) by Hannu Savolainen 1993-1996
YM3812 and OPL-3 driver Copyright (C) by Hannu Savolainen, Rob Hooft 1993-1996
del_from_runqueue(c10d8000) : Task not in run queue
prev_run=00000000 next_run=00000000 state=1 flags=100 nr_running=2
prev_task=c22e0000 next_task=c0a10000 pid=183
current=c10d8000 current->pid=183 current->state=1 current->flags=100
c10d9ea4 c22e0000 c0a10000 000000b7 c10d9ef8 001b8d0e 00000010 c010ff54
       c10d9ef8 c020bc00 c10d9f0c c010ff4b c10d9ef8 c0a3aa2c 00000000 c012d4c7
       00000000 00000000 c3317d98 00000000 00000004 00000000 c01e389c 001b8d0e
Call Trace: [<c010ff54>] [<c010ff4b>] [<c012d4c7>] [<c010fb98>] [<c012d874>] [<c012dc2d>] [<c01536af>]
       [<c0117200>] [<c01098b8>] [<c01088c4>]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Wed Feb 23 2000 - 21:00:14 EST