Re: Issues with AMD microcode updates

From: Sherry Hurwitz
Date: Tue Sep 24 2013 - 19:35:21 EST


On 09/19/2013 11:44 AM, Borislav Petkov wrote:
On Thu, Sep 19, 2013 at 11:58:34AM -0300, Henrique de Moraes Holschuh wrote:
Jacob, Andreas,

I take care of the amd64 microcode update support for Debian, and I'm
receiving user reports of lockup issues with the AMD microcode driver in
several kernels. This is about the runtime update interface,
/sys/devices/system/cpu/*/microcode/reload and
/sys/devices/system/cpu/microcode/reload.

Basically, the issue is that the process that tries to write "1" to the
reload node gets stuck in "D" state on several kernel versions.

I started by blacklisting several older kernels (e.g. I got a report of
2.6.38 locking up), but recently I got a report of a lockup with kernel
3.5.1. Blacklisting everything before 3.10 is not exactly kosher, not when
I would have to blindly trust 3.0, 3.2 and 3.4 to not have whatever issue is
causing the lockups.

IMHO that's the point where it becomes interesting to actually track down
the bug even if it apparently doesn't exist anymore on the more recent
kernels, and ensure that the stable/long-term kernels have the fix. That
would also help distros blacklist microcode update on the broken kernels.

Unfortunately, I don't own, or have access to, any boxes with an AMD
processor (let alone one with an AMD processor in need of a microcode
update) to bissect the problem.

I'd appreciate if AMD (or anyone with an AMD processor, really) could help
me track this issue down.

Debian bug reports:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=717185
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=723081
Well, both Andreas and Jacob don't work for AMD anymore. I could try to
help with this but it'll be slow as I'm pretty busy with other stuff.

Anyway, I'd suggest we look only on the long term kernels since they're
the only ones which can get updates/fixes anyway.

Now, how do I reproduce this? Writing 1 to .../reload on latest kernel
works here. So I'd need a reproducer. Alternatively, I'd need a sysrq-l
and sysrq-w from those systems with hung processes.

Thanks.

You can direct AMD microcode issues to me now.
We are setting up some systems in the lab and trying to duplicate
the problem now.

Thanks.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/