Re: [PATCH] sched/isolation: Make NO_HZ_FULL select CPU_ISOLATION

From: Paul E. McKenney
Date: Sat Dec 09 2017 - 13:10:19 EST


On Sat, Dec 09, 2017 at 02:09:07PM +0100, Frederic Weisbecker wrote:
> 2017-12-07 18:29 UTC+01:00, Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>:
> > On Thu, Dec 07, 2017 at 05:14:54PM +0100, Frederic Weisbecker wrote:
> >> 2017-12-04 18:16 UTC+01:00, Paul E. McKenney
> >> <paulmck@xxxxxxxxxxxxxxxxxx>:
> >> > On Mon, Dec 04, 2017 at 04:53:15PM +0100, Frederic Weisbecker wrote:
> >> >> 2017-12-02 20:24 UTC+01:00, Paul E. McKenney
> >> >> I would prefer to keep it. It's useful for automated boot testing
> >> >> based on configs such as 0-day or -tip test machines. But I'm likely
> >> >> to migrate it to isolcpus implementation. Maybe something along the
> >> >> lines of CONFIG_CPU_ISOLATION_ALL.
> >> >
> >> > How about instead allowing something like "nohz_full=1-" specify that
> >> > all CPUs other than CPU 0 should be nohz_full CPUs? That would shrink
> >> > the code by eliminating CONFIG_NO_HZ_FULL_ALL while still allowing
> >> > easy automation of that particular scenario.
> >> >
> >> > (Right now, the boot code complains about "nohz_full=1-", which means
> >> > that whatever is generating the boot parameters needs to know how many
> >> > CPUs there really are, which as you say can be a pain.)
> >>
> >> Yes but automated boot testing is rather based on configs than boot
> >> options. It's much easier. I think that's how Wu Fengguang lab works,
> >> and -tip automated tests as well.
> >
> > So you have gotten bug reports from them? Because I see splats from
> > rcutorture testing rather frequently. This thing is in no way a subtle
> > low-probability bug. ;-)
>
> Nope I haven't got anything from them. So far you're the only
> reproducer I know :)
>
> >> >> >> Did you have any nohz_full= or isolcpus= boot options?
> >> >> >
> >> >> > Replacing CONFIG_NO_HZ_FULL_ALL=y with nohz_full=1-7 works, that
> >> >> > is CONFIG_NO_HZ_FULL=y, CONFIG_NO_HZ_FULL_ALL=n, and nohz_full=1-7
> >> >> > on an eight-CPU test.
> >> >> >
> >> >> > But it is relatively easy to test. Running the rcutorture TREE04
> >> >> > scenario on a four-socket x86 gets me RCU CPU stall warnings within
> >> >> > a few minutes more than half the time. ;-)
> >> >>
> >> >> Indeed I managed to trigger something. If it's the same thing I should
> >> >> be able to track down the root cause.
> >> >>
> >> >> [ 123.907557] ??? Writer stall state RTWS_STUTTER(8) g160 c160 f0x0
> >> >> ->state 0x1 cpu 2
> >> >> [ 123.915184] rcu_torture_wri S 0 111 2 0x80080000
> >> >> [ 123.920673] Call Trace:
> >> >> [ 123.923096] ? __schedule+0x2bf/0xbb0
> >> >> [ 123.926715] ? _raw_spin_unlock_irqrestore+0x59/0x70
> >> >> [ 123.931657] schedule+0x3c/0x90
> >> >> [ 123.934777] schedule_timeout+0x1e1/0x560
> >> >
> >> > It might well be the same thing, as this schedule_timeout() does look
> >> > familiar. I have some diagnostic patches in -rcu, please see below
> >> > for the overall effect.
> >>
> >> I fear I can hit that even without any nohz_full CPU as well.
> >
> > Indeed, I do hit that with my TREE01 scenario, which does not set
> > CONFIG_NO_HZ_FULL. But it is much less frequent. The good news is that
> > I have finally figured out a way to extract information from this thing
> > without suppressing it. At the moment it appears to be a rather strange
> > deadlock involving CPU hotplug, timers, and RCU.
> >
> > But that is a completely different bug from the ones for which I have
> > the two patches in my tree.
> >
> > Anyway, I will keep those two patches because I cannot have the
> > corresponding bugs possibly hiding RCU bugs in my testing. If you
> > put some other fix in place, I will drop those two patches in favor of
> > your fix.
>
> Ok. I'm a bit troubled by this bug, I hate to push a fix for a bug I
> don't understand nor can reproduce.

I would be happy to talk you through running the TREE04 rcutorture
scenario, if you would like. I just verified that I can reproduce this
on a single-socket four-core (8 hardware threads) x86 system running
v4.15-rc1, so I would guess that you have access to hardware that can
reproduce it.

In the meantime, please feel free to take a look at the file
tools/testing/selftests/rcutorture/doc/initrd.txt in the kernel source.
This file tells how to create the initrd that rcutorture uses to avoid
the need to maintain root partitions. Once you have that in place in
tools/testing/selftests/rcutorture/initrd (as a expanded file tree,
not any sort of archive) the reproducer is as follows, run from the
top level of the kernel source tree:

bash tools/testing/selftests/rcutorture/bin/kvm.sh --duration 5 --configs TREE04

This will output a bunch of rcutorture status/progress text, ending
in something like the following:

--- Sat Dec 9 09:49:26 PST 2017 Test summary:
Results directory: /home/git/linux-2.6/tools/testing/selftests/rcutorture/res/2017.12.09-09:49:26
tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 7 --duration 5 --configs TREE04
TREE04 ------- 350 grace periods (1.16667 per second)
WARNING: Assertion failure in /home/git/linux-2.6/tools/testing/selftests/rcutorture/res/2017.12.09-09:49:26/TREE04/console.log
WARNING: Summary: Call Traces: 7 Stalls: 1 Starves: 1

The big long pathname ending in "console.log" contains the console output.

A successful run would end without the WARNING lines.

> But having CONFIG_NO_HZ_FULL
> select CONFIG_CPU_ISOLATION is already a fix for sanity that I need to
> push. So I think I'm going to take your patch anyway and rewrite the
> changelog to take all that into account.
>
> Thanks Paul!

Works for me! I don't know of anyone else encountering this, so I don't
see it as an emergency. Left to myself, I would therefore push the
fixes into the v4.17 merge window (that is, the one after the next one).
But please let me know when you have pushed fixes, and I will adjust my
tree accordingly.

I am of course happy to test your fixes.

Thanx, Paul