Re: CONFIG_ANDROID (was: rcu_sched detected expedited stalls in amdgpu after suspend)

From: Christian König
Date: Thu Jul 07 2022 - 03:30:59 EST


Am 06.07.22 um 22:42 schrieb Paul E. McKenney:
On Wed, Jul 06, 2022 at 08:09:49PM +0200, Uladzislau Rezki wrote:
On Wed, Jul 06, 2022 at 10:58:36AM -0700, Paul E. McKenney wrote:
On Wed, Jul 06, 2022 at 07:48:20PM +0200, Uladzislau Rezki wrote:
Hello.

On Mon, Jul 04, 2022 at 01:30:50PM +0200, Christian König wrote:
Hi guys,

Am 28.06.22 um 22:11 schrieb Uladzislau Rezki:
Excerpts from Paul E. McKenney's message of June 28, 2022 2:54 pm:
All you need to do to get the previous behavior is to add something like
this to your defconfig file:

CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=21000

Any reason why this will not work for you?
sorry for jumping in so later, I was on vacation for a week.

Well when any RCU period is longer than 20ms and amdgpu in the backtrace my
educated guess is that we messed up some timeout waiting for the hw.

We usually do wait a few us, but it can be that somebody is waiting for ms
instead.

So there are some todos here as far as I can see and It would be helpful to
get a cleaner backtrace if possible.

Actually CONFIG_ANDROID looks like is going to be removed, so the CONFIG_RCU_EXP_CPU_STALL_TIMEOUT
will not have any dependencies on the CONFIG_ANDROID anymore:

https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flkml.org%2Flkml%2F2022%2F6%2F29%2F756&data=05%7C01%7Cchristian.koenig%40amd.com%7C8b36bcb4fe61475c0eb708da5f8ffce8%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637927369274030797%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=eaK66spsbWVi2uRhcFK7eu4usgkHFZCSvErZxB%2F2npM%3D&reserved=0
But you can set the RCU_EXP_CPU_STALL_TIMEOUT Kconfig option, if you
wish. Setting this option to 20 will get you the behavior previously
obtained by setting the now-defunct ANDROID Kconfig option.

Right. Or over boot parameter. So for us it is not a big issue :)
Specifically rcupdate.rcu_exp_cpu_stall_timeout, for those just now
tuning in. ;-)

I was just about to write a response asking for that :)

Thanks, I will suggest to our QA to add this parameter while doing some tests.

Regards,
Christian.


Thanx, Paul