Re: Linux 4.15-rc2: Regression in resume from ACPI S3

From: Thomas Gleixner
Date: Wed Dec 13 2017 - 16:06:52 EST


On Wed, 13 Dec 2017, Thomas Gleixner wrote:
> On Wed, 13 Dec 2017, Thomas Gleixner wrote:
> > On Wed, 13 Dec 2017, Linus Torvalds wrote:
> >
> > > On Wed, Dec 13, 2017 at 8:41 AM, Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
> > > >
> > > > Definitely. That was fragile forever but puzzles me is that I can't figure
> > > > out what now causes that spurious interrupt to surface out of the blue.
> > >
> > > Perhaps just timing?
> >
> > That's what I'm trying to figure out right now, because that is the only
> > sensible explanation left. The whole machinery of suspend is exactly the
> > same with and without the vector changes. I instrumented all functions
> > involved and the picture is the same. I even do not see any fundamental
> > timing differences where one would say: That's it.
> >
> > What puzzles me even more is that in the range of commits I'm fiddling with
> > there is no other change than the vector management stuff and the point
> > where it breaks makes no sense at all. The point Maarten bisected it to
> > works nicely here, so that might just point to a very subtle timing issue.
>
> After doing more debugging on this it turns out that this looks like a
> legacy interrupt coming in. The vector number is always 55, which is legacy
> IRQ 7 as seen from the PIC. The corresponding IOAPIC interrupt pin is
> masked and vector 55 is completely unused.
>
> More questions than answers. Still investigating.

And it does not explain Maartens report which gets a spurious vector 33 on
CPU4 after the non boot cpus have been brought online again. And that's the
vector which was assigned before the affinity was moved by unplugging CPU4.

Hrmpf. Even more mystery to solve.

Thanks,

tglx