Re: PROBLEM: sparc64 random crashes starting w/ Linux 6.1 (regression)

From: Nick Bowler
Date: Sun Jan 29 2023 - 20:36:23 EST


On 2023-01-29, Peter Xu <peterx@xxxxxxxxxx> wrote:
> There's a similar report previously but interestingly it was exactly
> reported against commit 0ccf7f168e17, which was the one you reported all
> good:
>
> https://lore.kernel.org/all/20221021160603.GA23307@xxxxxxxxxxxx/
>
> It's probably because for some reason the thp split didn't really happen in
> your system (maybe thp disabled?) or it should break too.

This seems an accurate assessment: CONFIG_TRANSPARENT_HUGEPAGE is not set

> It also means 624a2c94f5b7a didn't really fix all the issues. So I assumed
> that's the only issue we had after verified with 624a2c94f5b7a on two
> existing reproducers and we assumed all issues fixed.
>
> However then with this report I looked into the whole set and I did notice
> the page migration code actually has similar problem. Sorry I should have
> noticed this even earlier. So very likely the previous two reports came
> from environment where page migration is either rare or not enabled. And
> now I suspect your system has page migration enabled.

I'd say that sounds correct too: I have CONFIG_COMPACTION=y which sets
CONFIG_MIGRATION=y

> Could you try below patch to see whether it fixes your problem? It should
> cover the last piece of possible issue with dirty bit on sparc after that
> patchset. It's based on latest master branch (commit ab072681eabe1ce0).

I applied this on top of 6.2-rc6 and will give this a spin now.

Thanks,
Nick