Re: Unrecoverable AER error when resuming from RAM (hda regression in 5.7-rc2)

From: Bjorn Helgaas
Date: Wed Apr 22 2020 - 19:21:32 EST


On Wed, Apr 22, 2020 at 11:25:04PM +0200, Takashi Iwai wrote:
> On Wed, 22 Apr 2020 22:50:28 +0200,
> Bjorn Helgaas wrote:
> > ...
> > I feel like this UR issue could be a PCI core issue or maybe some sort
> > of misuse of PCI power management, but I can't seem to get traction on
> > it.
> >
> > > Then the display freezes and the system basically falls apart (can't
> > > even sudo reboot -f, need to use magic sysrq).
> > >
> > > I bisected this to "ALSA: hda: Skip controller resume if not needed".
> > > Setting snd_hda_intel.power_save=0 resolves the issue.
> >
> > FWIW, the complete citation is c4c8dd6ef807 ("ALSA: hda: Skip
> > controller resume if not needed"),
> > https://git.kernel.org/linus/c4c8dd6ef807, which first appeared in
> > v5.7-rc2.
>
> Yes, and I posted the fix patch right now:
> https://lore.kernel.org/r/20200422203744.26299-1-tiwai@xxxxxxx
>
> The possible cause was the tricky resume code that both HD-audio
> controller (the parent PCI device) and the codec devices used.
>
> At least the patch above seems working for the reporter's machine.
> Now we need a bit more testing before merging, but it looks promising,
> so far.

Great, I'm glad you figured something out because I sure wasn't
getting anywhere!

Maybe this is a tangent, but I can't figure out what
snd_power_change_state() is doing. It *looks* like it's supposed to
change the PCI power state, but I gave up trying to figure out where
it actually touches the device.

It seems like sound has more magic in power management than other
device types, which makes me wonder if we're not providing the right
interfaces or something.

Bjorn