Re: ext4 corruption on alpha with 4.20.0-09062-gd8372ba8ce28

From: matoro
Date: Thu Aug 25 2022 - 11:23:50 EST


Hello all, I know this is quite an old thread. I recently acquired some alpha hardware and have run into this exact same problem on the latest stable kernel (5.18 and 5.19). CONFIG_COMPACTION seems to be totally broken and causes userspace to be extremely unstable - random segfaults, corruption of glibc data structures, gcc ICEs etc etc - seems most noticable during tasks with heavy I/O load.

My hardware is a DS15 (Titan), so only slightly newer than the Tsunamis mentioned earlier. The problem is greatly exacerbated when using a machine-optimized kernel (CONFIG_ALPHA_TITAN) over one with CONFIG_ALPHA_GENERIC. But it still doesn't go away on a generic kernel, just pops up less often, usually very I/O heavy tasks like checking out a tag in the kernel repo.

However all of this seems to be dependent on CONFIG_COMPACTION. With this toggled off all problems disappear, regardless of other options. I tried reverting the commit 88dbcbb3a4847f5e6dfeae952d3105497700c128 mentioned earlier in the thread (the structure has moved to a different file but was otherwise the same), but it unfortunately did not make a difference.

Since this doesn't seem to have a known cause or an easy fix, would it be reasonable to just add a Kconfig dep to disable it automatically on alpha?

Thank you!

-------- Original Message --------
Subject: Re: ext4 corruption on alpha with 4.20.0-09062-gd8372ba8ce28
Date: 2019-02-21 08:29
From: Jan Kara <jack@xxxxxxx>
To: Meelis Roos <mroos@xxxxxxxx>

On Thu 21-02-19 01:23:50, Meelis Roos wrote:
> > First, I found out that both the problematic alphas had memory compaction and
> > page migration and bounce buffers turned on, and working alphas had them off.
> >
> > Next, turing off these options makes the problematic alphas work.
>
> OK, thanks for testing! Can you narrow down whether the problem is due to
> CONFIG_BOUNCE or CONFIG_MIGRATION + CONFIG_COMPACTION? These are two
> completely different things so knowing where to look will help. Thanks!

Tested both.

Just CONFIG_MIGRATION + CONFIG_COMPACTION breaks the alpha.
Just CONFIG_BOUNCE has no effect in 5 tries.

OK, so page migration is problematic. Thanks for confirmation!

Honza