Re: nios2 crash/hang in mainline due to 'lib: update LZ4 compressor module'

From: Guenter Roeck
Date: Thu Mar 02 2017 - 22:06:15 EST


On 03/02/2017 08:38 AM, Tobias Klauser wrote:
On 2017-03-01 at 20:45:21 +0100, Guenter Roeck <linux@xxxxxxxxxxxx> wrote:
On Wed, Mar 01, 2017 at 07:58:17PM +0100, Sven Schmidt wrote:
Hi Guenter, Tobias and Sandra,

thanks for your effort here.

On Tue, Feb 28, 2017 at 10:14:13AM -0800, Guenter Roeck wrote:
On Tue, Feb 28, 2017 at 10:53:56AM -0700, Sandra Loosemore wrote:
On 02/28/2017 08:53 AM, Tobias Klauser wrote:
(adding Sandra Loosemore to Cc due to possible relation to gcc/binutils
for nios2)

On 2017-02-26 at 22:03:38 +0100, Guenter Roeck <linux@xxxxxxxxxxxx> wrote:
Hi Sven,

my qemu test for nios2 started failing with commit 4e1a33b105dd ("lib:
update LZ4 compressor module"). The test hangs early during boot before
any console output is seen. Reverting the offending patch as well as the
subsequent lz4 related patches fixes the problem. Disabling CONFIG_RD_LZ4
and with it other LZ4 options also fixes it (as does adding "return -EINVAL;"
at the top of the LZ4 decompression code). For reference, bisect log
is attached.

I tried with buildroot toolchains using gcc 6.1.0 as well as 6.3.0
and binutils 2.26.1. Scripts used to run the tests are available at
https://github.com/groeck/linux-build-test/tree/master/rootfs/nios2.
Qemu is from qemu mainline or qemu v2.8 with nios2 patches applied.

Looks like this is somehow related to gcc/binutils. Using GCC 4.8.3 and
binutils 2.24.51 (both from from Sourcery CodeBench Lite 2014.05) I can
get a kernel booting on latest master branch. AFAICT, none of the
LZ4_decompress_* functions are called during boot.


It seems a bit strange that code which is not actually called causes problems like that.

Yes, it is, though it is always possible. The code isn't exactly easy to
understand; there may be some hidden caveats such as global variables. It may
also be that some jump target exceeds its range (though why that would only
be seen with the LZ4 code is another question), or that the compiler gets
confused by the forced inlines (disabling that didn't make a difference,
though, nor did disabling -O3).

Please let me know if and how I may help you figure out what's happening, especially
regarding the differences between the previous LZ4 and the current implementation.


For my part I am all but clueless. Unless someone has an idea, we may to
disable LZ4 support for nios2 for the time being. Does anyone have thoughts
on that ? Of course, that would not help if the problem also affects
recent gcc/binutil versions on other architectures.

After some further investigations, I'd say this isn't "caused" by LZ4
specifically but by a more general problem with one of the nios2 arch
specific tools involved.

I manually enabled random additional CONFIG_* options and in some cases
I got the kernel to boot (with CONFIG_RD_LZ4 enabled and no return
-EINVAL in place) while in others I didn't. So I'd rather suspect this
problem to be connected to the size or structure of the generated vmlinux
image.

Or could this even be a problem with qemu? Did anyone already verify
this on the 10m50 devboard? (Unfortunately I don't have any nios2
devboard available right now, otherwise I would have done this...)


That is of course always possible.

Other than that I'm also becoming all but clueless... One option I
thought of was using the QEMU monitor to dump the CPU state after the
hang but so far I didn't manage to get it to work (hints appreciated ;)


Something like

qemu-system-nios2 -M 10m50-ghrd -kernel vmlinux -no-reboot \
-dtb arch/nios2/boot/dts/10m50_devboard.dtb \
--append "rdinit=/sbin/init" -initrd busybox-nios2.cpio

gives you a qemu monitor window. Use "info registers" to see registers.
Looks like it is stuck in init_bootmem_core, or at least that is what it
shows for me.

Guenter