Re: AVX "Sandy Bridge" hardware issue?

From: Andy Lutomirski
Date: Wed Jul 20 2011 - 09:55:20 EST


On 07/12/2011 04:16 PM, MK wrote:
Hi gang! I'd forgotten how busy this list is, I hope someone can help
me out.

I have a small VPS slice, run under openVZ, that I use for testing and
personal projects. Recently, the provider migrated to new Xeon "Sandy
Bridge" processors, which according to wikipedia are the first and
thus far only commercially available processors using AVX.

After the migration, I had a number of apache mod_perl applications
break due to SIGILL. Reproducible test case:

use Apache2::Const qw(SERVER_ERROR)

sub handler {
return SERVER_ERROR;
};

Apache2::Const is the indirect culprit here; if I remove it and just
return 500 the module works. Note that this is not a perl error. A
backtrace from running apache under gdb, triggering the issue, is here:

http://pastebin.com/16SrEzHM

I posted this to the mod_perl list and someone pointed me to a
backtrace identical in its final contexts, from a glibc bug
reported last year:

http://sourceware.org/bugzilla/show_bug.cgi?format=multiple&id=12113

Which involves AVX hardware. The VPS provider has provided me with a
bare Fedora 14 slice for debugging this issue, and the "small
reproducer" available from the above bug report, verified by Ulrich
Drepper, does reproduce the issue.

So I filed a glibc bug with fedora to that effect:

https://bugzilla.redhat.com/show_bug.cgi?id=720176

In which Andreas Schwab points out (rightly or wrongly) that according
to the /proc/cpuinfo from the slice, the processor actually does not
support AVX. However, the "model name", "Intel(R) Xeon(R) CPU
E31230", is according to this a Sandy Bridge processor with AVX:

http://en.wikipedia.org/wiki/Sandy_Bridge#Server_processors

And while I do not have access to the hardware, the provider is very
unequivocal about the fact that these are Sandy Bridges, which
apparently include AVX.

So I am looking for a next step to take in debugging this. The kernel
used on the slice (nb, openVZ does not allow for rolling your own) is
2.6.32 built with gcc 4.1.2. I think this may be prior to AVX support
in the kernel and gcc, but the glibc is 2.13, which apparently includes
it.

Does anyone have any idea why I would get this identical backtrace, and
a failed reproducer test, on hardware which supposedly supports AVX
(but not according to the kernel in /proc/cpuinfo)?

I was bored and read the manual. It looks like glibc is buggy: it checks whether the CPU supports AVX but not whether the OS enables AVX.

http://sourceware.org/bugzilla/show_bug.cgi?id=13007

That being said, you should still bug your provider for a better kernel. AVX is useful and should be enabled.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/