[PATCH] Re: matrox+history+xoff=fbcon/linux crash?

From: Petr Vandrovec (vandrove@vc.cvut.cz)
Date: Mon May 29 2000 - 16:49:42 EST


On Mon, May 29, 2000 at 11:34:08PM +0000, Petr Vandrovec wrote:
> > Anyone else have this configuration and can verify the crash?
> Hmm. It is quite reproducible here on dual PIII SMP G400 too :-(
> Probably it is time to start using 'video=scrollback:0' again...
> Has anybody local filesystem which does not need fsck after kernel crash?!
Hi again,
  I found offender - but if someone more knowledgable could confirm
that...
  When problem happens, stack trace on first CPU (this is CPU which
already was in console subsystem when another arrived) looks like:

CPU: 1
EIP: 0010:[<c010c2e5>]
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00000096
eax: 0000001d ebx: d0800000 ecx: c023ba34 edx: c023ba28
esi: c024e2b0 edi: d0800010 ebp: 00000010 esp: c15bbe1c
ds: 0018 es: 0018 ss: 0018
Process swapper (pid: 0, stackpage=c15bb000)
Stack: c15bbe20 00000018 c01a214d c021a23f c02b6ee0 c1532070 00000038 c1544000
       ffffffff c1544000 00000001 01e701e0 0000007f 00000010 21a00010 00000001
       c01a21e1 07070707 00000000 c02b6ee0 c1532078 00000004 0000021a 00000038
Call Trace: [<c01a214d>] [<c021a23f>] [<c01a21e1>] [<c0194519>] [<c0196ac7>] [<c0175053>] [<c0121afb>]
       [<c0121a1c>] [<c010d914>] [<c0109690>] [<c0109690>] [<c010bf0c>] [<c0109690>] [<c0109690>] [<c0100018>]
       [<c01096bd>] [<c0109702>] [<c010bf0c>]
Code: 50 1e 06 50 55 57 56 52 51 53 89 e0 50 e8 a9 fe ff ff 83 c4

>>EIP; c010c2e5 <printstate+9/2c> <=====
Trace; c01a214d <matrox_cfbX_putcs+32d/344>
Trace; c021a23f <dm_head_vals.930+7877/7cf5>
Trace; c01a21e1 <matrox_cfb8_putcs+7d/88>
Trace; c0194519 <fbcon_redraw_softback+201/2e4>
Trace; c0196ac7 <fbcon_scrolldelta+167/2b8>
Trace; c0175053 <console_softint+103/11c>
Trace; c0121afb <tasklet_action+4f/7c>
Trace; c0121a1c <do_softirq+5c/8c>
Trace; c010d914 <do_IRQ+e4/f4>
Trace; c0109690 <default_idle+0/34>
Trace; c0109690 <default_idle+0/34>
Trace; c010bf0c <ret_from_intr+0/20>
Trace; c0109690 <default_idle+0/34>
Trace; c0109690 <default_idle+0/34>
Trace; c0100018 <startup_32+18/c7>
Trace; c01096bd <default_idle+2d/34>
Trace; c0109702 <cpu_idle+3e/54>
Trace; c010bf0c <ret_from_intr+0/20>
Code; c010c2e5 <printstate+9/2c>
[code stripped, it is only popad; ret]

This is stacktrace of CPU which should not came here...

CPU: 0
EIP: 0010:[<c010c2e5>]
EFLAGS: 00000286
eax: 0000001e ebx: 21a00000 ecx: c023ba34 edx: c023ba28
esi: 00000008 edi: 00000130 ebp: 00000010 esp: cfd49d54
ds: 0018 es: 0018 ss: 0018
Process ls (pid: 16, stackpage=cfd49000)
Stack: cfd49d58 00000018 c01a1f38 c021a220 c02b6ee0 c152c04c 00000026 c1544000
       00000003 c1544000 00000001 c1544000 0000007f 00000010 21a00010 00000001
       c01a21e1 07070707 00000000 c02b6ee0 c152c04c 00000004 0000021a 00000026
Call Trace: [<c01a1f38>] [<c021a220>] [<c01a21e1>] [<c0194519>] [<c0196ac7>] [<c0196c35>] [<c019411b>]
       [<c0171e3e>] [<c017563a>] [<c017cb36>] [<c017f211>] [<c016c643>] [<c017ef78>] [<c0135cfe>] [<c010be4c>]
Code: 50 1e 06 50 55 57 56 52 51 53 89 e0 50 e8 a9 fe ff ff 83 c4

>>EIP; c010c2e5 <printstate+9/2c> <=====
Trace; c01a1f38 <matrox_cfbX_putcs+118/344>
Trace; c021a220 <dm_head_vals.930+7858/7cf5>
Trace; c01a21e1 <matrox_cfb8_putcs+7d/88>
Trace; c0194519 <fbcon_redraw_softback+201/2e4>
Trace; c0196ac7 <fbcon_scrolldelta+167/2b8>
Trace; c0196c35 <fbcon_set_origin+1d/24>
Trace; c019411b <fbcon_cursor+57/1c8>
Trace; c0171e3e <set_cursor+6e/80>
Trace; c017563a <con_flush_chars+12/18>
Trace; c017cb36 <opost_block+15a/174>
Trace; c017f211 <write_chan+299/3a4>
Trace; c016c643 <tty_write+24b/340>
Trace; c017ef78 <write_chan+0/3a4>
Trace; c0135cfe <sys_write+de/100>
Trace; c010be4c <system_call+34/38>
Code; c010c2e5 <printstate+9/2c>

[code stripped; it is only popad; ret]

So console system forgot to acquire lock somewhere between opost_block
and matrox_cfb8_putcs (I think that between opost_lock and
fbcon_set_origin, as scrollback is not reentrant too...). So
spin_lock(&console_lock); is missing either in con_flush_chars()
or in set_cursor() - and we cannot place it into set_cursor() because
of set_cursor() is invoked by vt_console_print(), which contains note:
Call me with console_lock held only...

There are 4 additional callers of set_cursor() which looks suspicious
to me: update_region(), redraw_screen(), unblank_screen() and
putconsxy(). I have no idea what is semantic of these functions and
I have to go home... But I think that at least putconsxy() should
acquire lock too.

So this patch is only minimal one. I was not able to cause crash
with this patch, but... there are still 4 unchecked entrypoints above...
                                                Petr Vandrovec
                                                vandrove@vc.cvut.cz

P.S.: Another solution is to disable scrollback. Then set_cursor
only shows cursor and this operation can be done specially (reentrant)
on vgacon/fbdev level (as software cursor blinks from bottomhalf,
these procedures are reentrant already... almost...).

P.P.S.: Alan, I do not know whether apply it or whether wait for
someone else's approval...

diff -urdN linux/drivers/char/console.c linux/drivers/char/console.c
--- linux/drivers/char/console.c Sun Apr 2 22:20:27 2000
+++ linux/drivers/char/console.c Mon May 29 21:05:03 2000
@@ -2290,10 +2290,13 @@
 
 static void con_flush_chars(struct tty_struct *tty)
 {
+ unsigned long flags;
         struct vt_struct *vt = (struct vt_struct *)tty->driver_data;
 
         pm_access(pm_con);
+ spin_lock_irqsave(&console_lock, flags);
         set_cursor(vt->vc_num);
+ spin_unlock_irqrestore(&console_lock, flags);
 }
 
 /*

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Wed May 31 2000 - 21:00:22 EST