Re: [PATCH v4 7/8] lockdep: Change hardirq{s_enabled,_context} to per-cpu variables

From: Peter Zijlstra
Date: Wed Jun 24 2020 - 05:02:25 EST


On Tue, Jun 23, 2020 at 10:24:04PM +0200, Peter Zijlstra wrote:
> On Tue, Jun 23, 2020 at 08:12:32PM +0200, Peter Zijlstra wrote:
> > Fair enough; I'll rip it all up and boot a KCSAN kernel, see what if
> > anything happens.
>
> OK, so the below patch doesn't seem to have any nasty recursion issues
> here. The only 'problem' is that lockdep now sees report_lock can cause
> deadlocks.
>
> It is completely right about it too, but I don't suspect there's much we
> can do about it, it's pretty much the standard printk() with scheduler
> locks held report.

So I've been getting tons and tons of this:

[ 60.471348] ==================================================================
[ 60.479427] BUG: KCSAN: data-race in __rcu_read_lock / __rcu_read_unlock
[ 60.486909]
[ 60.488572] write (marked) to 0xffff88840fff1cf0 of 4 bytes by interrupt on cpu 1:
[ 60.497026] __rcu_read_lock+0x37/0x60
[ 60.501214] cpuacct_account_field+0x1b/0x170
[ 60.506081] task_group_account_field+0x32/0x160
[ 60.511238] account_system_time+0xe6/0x110
[ 60.515912] update_process_times+0x1d/0xd0
[ 60.520585] tick_sched_timer+0xfc/0x180
[ 60.524967] __hrtimer_run_queues+0x271/0x440
[ 60.529832] hrtimer_interrupt+0x222/0x670
[ 60.534409] __sysvec_apic_timer_interrupt+0xb3/0x1a0
[ 60.540052] asm_call_on_stack+0x12/0x20
[ 60.544434] sysvec_apic_timer_interrupt+0xba/0x130
[ 60.549882] asm_sysvec_apic_timer_interrupt+0x12/0x20
[ 60.555621] delay_tsc+0x7d/0xe0
[ 60.559226] kcsan_setup_watchpoint+0x292/0x4e0
[ 60.564284] __rcu_read_unlock+0x73/0x2c0
[ 60.568763] __unlock_page_memcg+0xda/0xf0
[ 60.573338] unlock_page_memcg+0x32/0x40
[ 60.577721] page_remove_rmap+0x5c/0x200
[ 60.582104] unmap_page_range+0x83c/0xc10
[ 60.586582] unmap_single_vma+0xb0/0x150
[ 60.590963] unmap_vmas+0x81/0xe0
[ 60.594663] exit_mmap+0x135/0x2b0
[ 60.598464] __mmput+0x21/0x150
[ 60.601970] mmput+0x2a/0x30
[ 60.605176] exit_mm+0x2fc/0x350
[ 60.608780] do_exit+0x372/0xff0
[ 60.612385] do_group_exit+0x139/0x140
[ 60.616571] __do_sys_exit_group+0xb/0x10
[ 60.621048] __se_sys_exit_group+0xa/0x10
[ 60.625524] __x64_sys_exit_group+0x1b/0x20
[ 60.630189] do_syscall_64+0x6c/0xe0
[ 60.634182] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 60.639820]
[ 60.641485] read to 0xffff88840fff1cf0 of 4 bytes by task 2430 on cpu 1:
[ 60.648969] __rcu_read_unlock+0x73/0x2c0
[ 60.653446] __unlock_page_memcg+0xda/0xf0
[ 60.658019] unlock_page_memcg+0x32/0x40
[ 60.662400] page_remove_rmap+0x5c/0x200
[ 60.666782] unmap_page_range+0x83c/0xc10
[ 60.671259] unmap_single_vma+0xb0/0x150
[ 60.675641] unmap_vmas+0x81/0xe0
[ 60.679341] exit_mmap+0x135/0x2b0
[ 60.683141] __mmput+0x21/0x150
[ 60.686647] mmput+0x2a/0x30
[ 60.689853] exit_mm+0x2fc/0x350
[ 60.693458] do_exit+0x372/0xff0
[ 60.697062] do_group_exit+0x139/0x140
[ 60.701248] __do_sys_exit_group+0xb/0x10
[ 60.705724] __se_sys_exit_group+0xa/0x10
[ 60.710201] __x64_sys_exit_group+0x1b/0x20
[ 60.714872] do_syscall_64+0x6c/0xe0
[ 60.718864] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 60.724503]
[ 60.726156] Reported by Kernel Concurrency Sanitizer on:
[ 60.732089] CPU: 1 PID: 2430 Comm: sshd Not tainted 5.8.0-rc2-00186-gb4ee11fe08b3-dirty #303
[ 60.741510] Hardware name: Intel Corporation S2600GZ/S2600GZ, BIOS SE5C600.86B.02.02.0002.122320131210 12/23/2013
[ 60.752957] ==================================================================

And I figured a quick way to get rid of that would be something like the
below, seeing how volatile gets auto annotated... but that doesn't seem
to actually work.

What am I missing?



diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 352223664ebd..b08861118e1a 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -351,17 +351,17 @@ static int rcu_preempt_blocked_readers_cgp(struct rcu_node *rnp)

static void rcu_preempt_read_enter(void)
{
- current->rcu_read_lock_nesting++;
+ (*(volatile int *)&current->rcu_read_lock_nesting)++;
}

static int rcu_preempt_read_exit(void)
{
- return --current->rcu_read_lock_nesting;
+ return --(*(volatile int *)&current->rcu_read_lock_nesting);
}

static void rcu_preempt_depth_set(int val)
{
- current->rcu_read_lock_nesting = val;
+ WRITE_ONCE(current->rcu_read_lock_nesting, val);
}

/*