Re: Bisected post-3.9 regression: Resume takes 5 times as much timeas with v3.9

From: Paul E. McKenney
Date: Sun May 12 2013 - 16:58:04 EST


On Sun, May 12, 2013 at 08:29:35PM +0200, Joerg Roedel wrote:
> Hi Paul,
>
> On Sun, May 12, 2013 at 04:31:57AM -0700, Paul E. McKenney wrote:
> > On Sat, May 11, 2013 at 08:04:50PM +0200, Bjørn Mork wrote:
> > > Bisecting it ended up pointing to
> > >
> > > commit c0f4dfd4f90f1667d234d21f15153ea09a2eaa66
> > > Author: Paul E. McKenney <paul.mckenney@xxxxxxxxxx>
> > > Date: Fri Dec 28 11:30:36 2012 -0800
> > >
> > > rcu: Make RCU_FAST_NO_HZ take advantage of numbered callbacks
> > >
> > > Because RCU callbacks are now associated with the number of the grace
> > > period that they must wait for, CPUs can now take advance callbacks
> > > corresponding to grace periods that ended while a given CPU was in
> > > dyntick-idle mode. This eliminates the need to try forcing the RCU
> > > state machine while entering idle, thus reducing the CPU intensiveness
> > > of RCU_FAST_NO_HZ, which should increase its energy efficiency.
> > >
> > > Signed-off-by: Paul E. McKenney <paul.mckenney@xxxxxxxxxx>
> > > Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>
> > >
> > >
> > >
> > > Being a big patch, I'm pretty sure that the problem is some minor
> > > issue. But rather than trying to userstand this, just tried reverting
> > > it on top of the current mainline and can confirm that this fixes the
> > > regression. I'll leave the understanding to you :)
> > >
> > > I'm attaching the revert patch as I had to fix a conflict, and may have
> > > done something wrong there. I'm also attaching my .config.
> > >
> > > Let me know if you need more information, or want me to try out proposed
> > > fixes.
> >
> > We don't want to back out the RCU_FAST_NO_HZ changes due to their
> > energy-efficiency benefits. So could you please try out Borislav's
> > patch below? He ran into the same issue a few weeks ago, and this
> > one fixed it for him.
>
> I get a ~10min boot delay with this patch:
>
> [ 1.149676] system 00:01: [mem 0xf6000000-0xf6003fff] could not be reserved
> [ 1.149724] system 00:01: Plug and Play ACPI device, IDs PNP0c02 (active)
> [ 603.957670] pnp 00:02: [dma 4]
> [ 603.957735] pnp 00:02: Plug and Play ACPI device, IDs PNP0200 (active)
>
> This happens on my AMD FX-6100 system. I bisected the problem down to the same
> commit and reverting it fixes the problem. Any ideas?

That does look pretty extreme! If you build with CONFIG_RCU_NO_HZ=n,
but without the revert, do you still get the delays?

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/