Re: [2.1.101] Double Oops deadlocks SMP kernel

Gabriel Paubert (paubert@iram.es)
Wed, 20 May 1998 03:07:51 +0200 (METDST)


[Note to Ingo, I was replying to the original post when I received you
message :-)]

On Tue, 19 May 1998, Marcus Meissner wrote:

> Hi,
>
> Using wine I can manage to deadlock the Linux kernel.
>
> Machine is: AMD K6/200, 64MB
> Kernel: Linux 2.1.101, SMP (on UP) and UP.
>
> (rest of configuration and .config is irrelevant for this case)
>
> On quitting a debugged WINE executable following Oops appears:
>
> |invalid TSS: 01e4
> |CPU: 0
> |EIP: 0010:[<c01112c0>]
> |EFLAGS: 00000282
> |eax: c01e071c ebx: c2820000 ecx: c2821ef0 edx: 00000080
> |esi: c0106000 edi: 00001e80 ebp: c2821f04 esp: c2821edc
> |ds: 0018 es: 0018 ss: 0018
> |Process wine (pid: 182, process nr: 35, stackpage=c2821000)
> |Stack: general protection: 01e4
> |CPU: 0
> |EIP: 0010:[<c010ab75>]
> |EFLAGS: 00010046
> |eax: c26d8280 ebx: c2821edc ecx: c3a3e000 edx: c01e0a50
> |esi: 00000000 edi: c2821edc ebp: c2821ea8 esp: c2821e40
> |ds: 0018 es: 0018 ss: 0018
> |Process wine (pid: 182, process nr: 35, stackpage=c2821000)
> |Stack: 000001e7 c2821ea8 c2821ea8 00001e80 c2821f04 c28201ce 00000202 00000082
> | c01f0018 c010accc c2821ea8 c01c0c8f c01c0d0a 000001e4 000001e4 c010af65
> | c01c0d0a c2821ea8 000001e4 0000000b c2820000 c2820000 c0106000 c010a8e0
> |Call Trace: [<c010accc>] [<c01c0c8f>] [<c01c0d0a>] [<c010af65>] [<c01c0d0a>] [<c0106000>] [<c010a8e0>]
> | [<c0106000>] [<c01112c0>] [<c0110d7c>] [<c012d8c0>] [<c012dc31>] [<c010e73a>] [<c010a768>]
> |Code: 0f a1 83 c7 04 50 68 5f 0c 1c c0 e8 7b 8d 00 00 83 c4 08 46
>

I have an explanation for that one: the modify_ldt code in 2.1.xx does not
check that you modify one of the current segment descriptors. This may
result in a strange behaviour. At least 2.0.xx performs a check like
the following in arch/i386/kernel/ldt.c:

if (regs->fs == sel || regs->gs == sel)
return -EBUSY;

this check is not present in 2.1.xx. Hence the system will happily clear
an ldt entry for a segment which is in one of the registers. There are
numerous M$ programs which free a segment but keep its selector in a
register. Of course this is a bug... Now under 2.1.xx nothing happens
until there is a task switch back to the original task which has freed
the selector but kept its value in fs or gs. In this case the inalid
segment is reloaded during the task switch and while it normally would
cause exception 13 (the GPF for which Doze is famous), cause an invalid
TSS exception in this context.

The contents of the stack are then printed with the following code:

for(i=0; i < kstack_depth_to_print; i++) {
if (((long) stack & 4095) == 0)
break;
if (i && ((i % 8) == 0))
printk("\n ");
printk("%08lx ", get_seg_long(ss,stack++));
}

in which get_seg_long is a macro which uses fs as a temporary and
thus causes it to be reloaded. This is the reason of the nested excecption
and it seems it dies because it tries to get the die_lock, which it
already holds. Note the oddity of task switches on Intel, the processors
loads all the selectors (the visible part), but the cached value are
invalid at this time. This means that when you get an invalid TSS fault
some apparently innocuous code sequence like push/pop of a segment
register can cause an exception, to my knowledge this is the only context
in which it can happen.

Now I'm not sure that the solution of returning -EBUSY in 2.0.xx was the
right one. The other possibility is to clear the offending segment
register. I don't know anything about Doze programming and I refuse
to learn ;)

Gabriel.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu