Re: frequent lockups in 3.18rc4

From: David Lang
Date: Fri Dec 12 2014 - 15:08:33 EST


On Fri, 12 Dec 2014, Linus Torvalds wrote:

I'm also not sure if the bug ever happens with preemption disabled.
Sasha, was that you who reported that you cannot reproduce it without
preemption? It strikes me that there's a race condition in
__cond_resched() wrt preemption, for example: we do

__preempt_count_add(PREEMPT_ACTIVE);
__schedule();
__preempt_count_sub(PREEMPT_ACTIVE);

and in between the __schedule() and __preempt_count_sub(), if an
interrupt comes in and wakes up some important process, it won't
reschedule (because preemption is active), but then we enable
preemption again and don't check whether we should reschedule (again),
and we just go on our merry ways.

Now, I don't see how that could really matter for a long time -
returning to user space will check need_resched, and sleeping will
obviously force a reschedule anyway, so these kinds of races should at
most delay things by just a tiny amount,

If the machine has NOHZ and has a cpu bound userspace task, it could take quite a while before userspace would trigger a reschedule (at least if I've understood the comments on this thread properly)

David Lang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/