Re: 2.6.14 X spinning in the kernel

From: Badari Pulavarty
Date: Tue Nov 15 2005 - 17:50:23 EST


Andrew Morton wrote:

Badari Pulavarty <pbadari@xxxxxxxxxx> wrote:

On Mon, 2005-11-14 at 16:17 -0800, Andrew Morton wrote:

Badari Pulavarty <pbadari@xxxxxxxxxx> wrote:

My 2-cpu EM64T machine started showing this problem again on 2.6.14.
On some reboots, X seems to spin in the kernel forever.

sysrq-t output shows nothing.

X R running task 0 3607 3589 3903
(L-TLB)

top shows:
3607 root 25 0 0 0 0 R 99.1 0.0 262:04.69 X


So, I wrote a module to do smp_call_function() on all CPUs
to show stacks on them. CPU0 seems to be spinning in exit_mmap().
I did this multiple times to collect stacks few times.

Is this a known issue ?

Nope. Maybe your vma list has a loop in it, in remove_vma()? slab
debugging would detect that, due to the repeated
kmem_cache_free(vm_area_cachep, vma);

I compiled the kernel with slab debug and rebooted the machine.
X seems to be spinning again. But this time, it shows completely
different routines (and seems to be switching CPUs) :(
Something weird is happening on my machine..

top:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3600 root 25 0 0 0 0 R 99.9 0.0 8:29.18 X


...

Then I tried killing it and ran into..

CPU0:
ffffffff8053c750 0000000000000000 00000000000018ff ffff81011c9a4230
ffff81011c9a4000 ffffffff8053c788 ffffffff8026de8f
ffffffff8053c7a8
ffffffff80119591 ffffffff8053c7a8
Call Trace: <IRQ> <ffffffff8026de8f>{showacpu+47}
<ffffffff80119591>{smp_call_function_interrupt+81}
<ffffffff8010e968>{call_function_interrupt+132} <EOI>
<ffffffff880fa225>{:radeon:radeon_do_wait_for_idle+117}
<ffffffff880fa236>{:radeon:radeon_do_wait_for_idle+134}
<ffffffff880fa590>{:radeon:radeon_do_cp_idle+336}
<ffffffff880fc215>{:radeon:radeon_do_release+85}
<ffffffff88104369>{:radeon:radeon_driver_pretakedown+9}
<ffffffff802783aa>{drm_takedown+74}


ah-hah. We've had machines stuck in radeon_do_wait_for_idle() before. In
fact, my workstation was doing it a year or two back.

Are you able to identify the most recent kernel which didn't do this?

I got this machine recently. 2.6.14-rc kernels are the first ones I tried on this box and they have been failing :(

Only known good kernel I had so far was RHEL4 (2.6.9 base).

Thanks,
Badari


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/