Re: WARNING: at kernel/sched.c:4440 sub_preempt_count+0x81/0x95()

From: Nick Piggin
Date: Thu Jan 15 2009 - 05:54:21 EST


On Thursday 15 January 2009 21:00:13 Ingo Molnar wrote:
> * Nick Piggin <nickpiggin@xxxxxxxxxxxx> wrote:
> > On Tuesday 13 January 2009 23:34:25 Ingo Molnar wrote:
> > > * Michal Hocko <mhocko@xxxxxxx> wrote:
> > > > Hi,
> > > > I am not sure whether someone has already reported this, but
> > > > I can see the following early boot WARNING with the 2.6.29-rc1 kernel
> > > > in the serial console output:
> > > >
> > > > ------------[ cut here ]------------
> > > > WARNING: at kernel/sched.c:4440 sub_preempt_count+0x81/0x95()
> > >
> > > That one should be fixed in tip/master and it is in the to-Linus queue
> > > of fixes:
> > >
> > > http://people.redhat.com/mingo/tip.git/README
> > >
> > > it's this commit:
> > >
> > > 01e3eb8: Revert "sched: improve preempt debugging"
> > >
> > > if you want to cherry-pick the fix.
> >
> > OK, but I still don't think this is the actual problem, but there is
> > something amiss in the init code being exposed by it.
>
> the warnings triggered after a softirq, and there's already preempt-leak
> checks in the softirq code - so we can exclude that.
>
> a hardirq might have leaked a preempt count - but that would have quite
> bad effects [with quick atomic check asserts in schedule()], wouldnt it?
> So i tend to think that this is a false positive.
>
> One problem i can think of (and which i outlined in the revert commit log)
> is that if a hardirq hits this window in lock_kernel():
>
> void __lockfunc lock_kernel(void)
> {
> int depth = current->lock_depth+1;
> <-------------- HERE

current->lock_depth is not yet modified at this point.

> if (likely(!depth))
> __lock_kernel();

And here we increment preempt_count when taking the BKL, but we
has not yet modified current->lock_depth, so preempt debugging
will proceed with no impact of my patch at this point, regardless
of what interrupts are taken.

So there is a race window, but it is to err on the safe side and
not trigger a false report.

> current->lock_depth = depth;
> }

At this point, the BKL addition to preempt_count will be taken into
account in underflow checking of subsequent preempt counters.


> then we have kernel_locked() already true (it checks lock_depth), but the
> preempt count is not elevated yet via __lock_kernel(). So if we return
> from the hardirq [and run into softirqs that end with a preempt_enable() -
> a pure hardirq exit has no preempt debug check] we'll incorrectly think
> that there's a preempt leak going on.

I don't think so. And anyway BKL is taken very early in boot and not
released for a long time, so I don't think it is possible for these
kinds of races to trigger here anyway.

So I really think it still is a bug or some other misunderstanding we
have. I'd prefer to try to fix it or at least discover why it is happening.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/