Re: [PATCH 0/6] Drain remote per-cpu directly v3

From: Qian Cai
Date: Thu May 19 2022 - 09:29:57 EST


On Wed, May 18, 2022 at 10:15:03AM -0700, Paul E. McKenney wrote:
> So does this python script somehow change the tracing state? (It does
> not look to me like it does, but I could easily be missing something.)

No, I don't think so either. It pretty much just offline memory sections
one at a time.

> Either way, is there something else waiting for these RCU flavors?
> (There should not be.) Nevertheless, if so, there should be
> a synchronize_rcu_tasks(), synchronize_rcu_tasks_rude(), or
> synchronize_rcu_tasks_trace() on some other blocked task's stack
> somewhere.

There are only three blocked tasks when this happens. The kmemleak_scan()
is just the victim waiting for the locks taken by the stucking
offline_pages()->synchronize_rcu() task.

task:kmemleak state:D stack:25824 pid: 1033 ppid: 2 flags:0x00000008
Call trace:
__switch_to
__schedule
schedule
percpu_rwsem_wait
__percpu_down_read
percpu_down_read.constprop.0
get_online_mems
kmemleak_scan
kmemleak_scan_thread
kthread
ret_from_fork

task:cppc_fie state:D stack:23472 pid: 1848 ppid: 2 flags:0x00000008
Call trace:
__switch_to
__schedule
lockdep_recursion

task:tee state:D stack:24816 pid:16733 ppid: 16732 flags:0x0000020c
Call trace:
__switch_to
__schedule
schedule
schedule_timeout
__wait_for_common
wait_for_completion
__wait_rcu_gp
synchronize_rcu
lru_cache_disable
__alloc_contig_migrate_range
isolate_single_pageblock
start_isolate_page_range
offline_pages
memory_subsys_offline
device_offline
online_store
dev_attr_store
sysfs_kf_write
kernfs_fop_write_iter
new_sync_write
vfs_write
ksys_write
__arm64_sys_write
invoke_syscall
el0_svc_common.constprop.0
do_el0_svc
el0_svc
el0t_64_sync_handler
el0t_64_sync

> Or maybe something sleeps waiting for an RCU Tasks * callback to
> be invoked. In that case (and in the above case, for that matter),
> at least one of these pointers would be non-NULL on some CPU:
>
> 1. rcu_tasks__percpu.cblist.head
> 2. rcu_tasks_rude__percpu.cblist.head
> 3. rcu_tasks_trace__percpu.cblist.head
>
> The ->func field of the pointed-to structure contains a pointer to
> the callback function, which will help work out what is going on.
> (Most likely a wakeup being lost or not provided.)

What would be some of the easy ways to find out those? I can't see anything
interesting from the output of sysrq-t.

> Alternatively, if your system has hundreds of thousands of tasks and
> you have attached BPF programs to short-lived socket structures and you
> don't yet have the workaround, then you can see hangs. (I am working on a
> longer-term fix.) In the short term, applying the workaround is the right
> thing to do. (Adding a couple of the BPF guys on CC for their thoughts.)

The system is pretty much idle after a fresh reboot. The only workload is
to run the script.