Re: [REGRESSION 4.20-rc1] 45975c7d21a1 ("rcu: Define RCU-sched API in terms of RCU for Tree RCU PREEMPT builds")

From: Paul E. McKenney
Date: Tue Nov 13 2018 - 10:10:45 EST


On Tue, Nov 13, 2018 at 03:54:53PM +0200, Ville Syrjälä wrote:
> Hi Paul,
>
> After 4.20-rc1 some of my 32bit UP machines no longer reboot/shutdown.
> I bisected this down to commit 45975c7d21a1 ("rcu: Define RCU-sched
> API in terms of RCU for Tree RCU PREEMPT builds").
>
> I traced the hang into
> -> cpufreq_suspend()
> -> cpufreq_stop_governor()
> -> cpufreq_dbs_governor_stop()
> -> gov_clear_update_util()
> -> synchronize_sched()
> -> synchronize_rcu()
>
> Only PREEMPT=y is affected for obvious reasons, but that couldn't
> explain why the same UP kernel booted on an SMP machine worked fine.
> Eventually I realized that the difference between working and
> non-working machine was IOAPIC vs. PIC. With initcall_debug I saw
> that we mask everything in the PIC before cpufreq is shut down,
> and came up with the following fix:
>
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index 7aa3dcad2175..f88bf3c77fc0 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -2605,4 +2605,4 @@ static int __init cpufreq_core_init(void)
> return 0;
> }
> module_param(off, int, 0444);
> -core_initcall(cpufreq_core_init);
> +late_initcall(cpufreq_core_init);

Thank you for testing this and tracking it down!

I am glad that you have a fix, but I hope that we can arrive at a less
constraining one.

> Here's the resulting change in inutcall_debug:
> pci 0000:00:00.1: shutdown
> hub 4-0:1.0: hub_ext_port_status failed (err = -110)
> agpgart-intel 0000:00:00.0: shutdown
> + PM: Calling cpufreq_suspend+0x0/0x100
> PM: Calling mce_syscore_shutdown+0x0/0x10
> PM: Calling i8259A_shutdown+0x0/0x10
> - PM: Calling cpufreq_suspend+0x0/0x100
> + reboot: Restarting system
> + reboot: machine restart
>
> I didn't really look into what other ramifications the cpufreq
> initcall change might have. cpufreq_global_kobject worries
> me a bit. Maybe that one has to remain in core_initcall() and
> we could just move the suspend to late_initcall()? Anyways,
> I figured I'd leave this for someone more familiar with the
> code to figure out ;)

Let me guess...

When the system suspends or shuts down, there comes a point after which
there is only a single CPU that is running with preemption and interrupts
are disabled. At this point, RCU must change the way that it works, and
the commit you bisected to would make the change more necessary. But if
I am guessing correctly, we have just been getting lucky in the past.

It looks like RCU needs to create a struct syscore_ops with a shutdown
function and pass this to register_syscore_ops(). Maybe a suspend
function as well. And RCU needs to invoke register_syscore_ops() at
a time that causes RCU's shutdown function to be invoked in the right
order with respect to the other work in flight. The hope would be that
RCU's suspend function gets called just as the system transitions into
a mode where the scheduler is no longer active, give or take.

Does this make sense, or am I confused?

Thanx, Paul