Re: scheduling with io_lock held in 2.4.6

From: Manfred Spraul (manfred@colorfullife.com)
Date: Sun Aug 19 2001 - 15:42:48 EST


> I'll check if IRQs are really masked (how?).

__save_flags() and check IIRC bit 9 - but I doubt that this is your
problem.

> But how on earth
> could somebody else release the IRQs on the same CPU while the io
> spinlock is held? We'd have to schedule once to allow somebody else to
> release IRQs first .. (infinite mental regress follows).

Backtraces contain stale functions from previous calls.

The first entry is always correct, then you must manually check that the
calls are possible, i.e.
> Trace; c011b548 <do_exit+3b4/3d4>
> Trace; c0107a38 <do_divide_error+0/9c>
> Trace; c0113683 <do_page_fault+3f3/510>
> Trace; c0113290 <do_page_fault+0/510>

either do_page_fault or do_divide_error are stale.
+0 means that someone pushed the function address on the stack - if a
function calls another function, then the offset is at least +6 on i386.

check if do_page_fault+3f3 actually calls do_exit, and if do_exit+3b4
calls exit_notify etc. Try clear all stale entries from the callchain.
Note that the address pushed is behind the call instruction, and call
instructions are usually 5 or 6 bytes long. Disassemble from +3d0 and
check the call.

And are you sure that schedule() was actually called from within an
interrupt? I don't see the do_IRQ()//handle_IRQ_event.

I bet it was a schedule() within spin_lock_bh().

> Even when I rebooted
> with noapic (I seem to recall). Booting back to 2.4.3 cured it.

Which network card do you use?
Networking is the main user of spin_lock_bh(), and since 2.4.4 zero-copy
networking is merged.

Good luck,
    Manfred

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Thu Aug 23 2001 - 21:00:31 EST