Re: Linux 3.19-rc5

From: Linus Torvalds
Date: Sun Feb 01 2015 - 15:09:40 EST


On Sun, Feb 1, 2015 at 11:43 AM, Oleg Nesterov <oleg@xxxxxxxxxx> wrote:
>
> And personally I agree. sched_annotate_sleep() looks self-documented, it
> is clear that it is used to suppress the warning.

But *that's not the problem*.

If it was just silencing the warning, things would be fine.

But it is actively screwing task state up, and actually changing
behavior of the kernel (in a very subtle part of the code too), and
doing so in ways that potentially introduce WORSE BUGS THAN THE WHOLE
DAMN DEBUG SUPPORT WAS SUPPOSED TO FIND IN THE FIRST PLACE.

That's the thing that upsets me. This is debug code. It's not even
debugging things that are "buggy" - its' just finding things that can
be potentially slightly inefficient. And it already introduced a
subtle bug once.

(Ok, it wasn't *that* subtle in the end, but it needed both a good
bisection result and some reading of unrelated source code to figure
out, so it certainly wasn't really obvious).

> Still. Can't we avoid this subtle change in behaviour DEBUG_ATOMIC_SLEEP
> adds?

Now this patch I agree with 100% percent, apart from you calling it a
"subtle change". It's more than a "subtle change", this was adding a
real and present bug that took two weeks to get fixed!

(And by "took two weeks to get fixed" I mean "not actually fixed yet,
but now we at least know what was going on").

I like your patch, but I'm going to combine it with mine that actually
fixes a real bug, because what you don't see in that patch of yours:

> --- x/include/linux/kernel.h
> +++ x/include/linux/kernel.h
> @@ -176,7 +176,7 @@ extern int _cond_resched(void);
> */
> # define might_sleep() \
> do { __might_sleep(__FILE__, __LINE__, 0); might_resched(); } while (0)
> -# define sched_annotate_sleep() __set_current_state(TASK_RUNNING)
> +# define sched_annotate_sleep() (current->task_state_change = 0)
> #else
> static inline void ___might_sleep(const char *file, int line,
> int preempt_offset) { }
> --- x/kernel/sched/core.c
> +++ x/kernel/sched/core.c
> @@ -7292,7 +7292,7 @@ void __might_sleep(const char *file, int
> * since we will exit with TASK_RUNNING make sure we enter with it,
> * otherwise we will destroy state.
> */
> - if (WARN_ONCE(current->state != TASK_RUNNING,
> + if (WARN_ONCE(current->state != TASK_RUNNING && current->task_state_change,
> "do not call blocking ops when !TASK_RUNNING; "
> "state=%lx set at [<%p>] %pS\n",
> current->state,

is that the whole "if (WARN_ONCE()" remains horribly buggy, because of
the line that follows it:

__set_current_state(TASK_RUNNING);

in the if-statement.

Now, I have the patch that removes that thing (but I was hoping to get
it from the scheduler tree before doing rc7, which seems to not have
happened), but yes, that together with your patch seems like it should
fix all the nasty bug-inducing crud where the "debugging helpers" end
up silently changing core process state.

I'll just combine it with yours to avoid extra noise in this area, and
mark you as the author, fixing *both* of the incorrect state changes.
Ok?

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/