Re: WARNING: CPU: 0 PID: 0 at drivers/irqchip/irq-gic-v3-its.c

From: Qian Cai
Date: Fri Nov 09 2018 - 13:41:18 EST




> On Nov 9, 2018, at 12:41 PM, Marc Zyngier <marc.zyngier@xxxxxxx> wrote:
>
> On 09/11/18 17:28, Sudeep Holla wrote:
>> On Fri, Nov 9, 2018 at 4:10 PM Marc Zyngier <marc.zyngier@xxxxxxx> wrote:
>>>
>> [...]
>>
>>>
>>> See bb42ca474010 and d003d029cea8 for details.
>>>
>>> Now, activating this workaround leads to lockdep being really angry,
>>> most likely because the cpus_read_lock is not taken, which is a change
>>> in behaviour...
>>>
>>> I'm trying to dig into this now.
>>>
>>
>> Yes we found similar issue in kernel/sched/core.c sched_init_smp
>> There's a fix with detailed description in -next
>> (Commit 40fa3780bac2 ("sched/core: Take the hotplug lock in sched_init_smp()")
>>
>> The behaviour changed since commit cb538267ea1e ("jump_label/lockdep:
>> Assert we hold the hotplug lock for _cpuslocked() operations")
>
> I indeed came to the same conclusion, but the fix is slightly less than
> obvious. I have the following arm64-specific crap, but it is pretty
> terrible:
>
> diff --git a/arch/arm64/kernel/time.c b/arch/arm64/kernel/time.c
> index f258636273c9..9e96e9eaca9b 100644
> --- a/arch/arm64/kernel/time.c
> +++ b/arch/arm64/kernel/time.c
> @@ -36,6 +36,7 @@
> #include <linux/clocksource.h>
> #include <linux/clk-provider.h>
> #include <linux/acpi.h>
> +#include <linux/cpu.h>
>
> #include <clocksource/arm_arch_timer.h>
>
> @@ -69,7 +70,9 @@ void __init time_init(void)
> u32 arch_timer_rate;
>
> of_clk_init(NULL);
> + cpus_read_lock();
> timer_probe();
> + cpus_read_unlock();
>
> tick_setup_hrtimer_broadcast();
>
> Qian, can you please let me know if this helps? If it does, we'll have
> to think of something a bit betterâ
After applied the above patch, the original warning is gone but there
Is now a new warning.

> [ 0.000000] rcu: Offload RCU callbacks from CPUs: (none).
> [ 0.000000]
> [ 0.000000] ======================================================
> [ 0.000000] WARNING: possible circular locking dependency detected
> [ 0.000000] 4.20.0-rc1+ #10 Tainted: G T
> [ 0.000000] ------------------------------------------------------
> [ 0.000000] swapper/0/0 is trying to acquire lock:
> [ 0.000000] (____ptrval____) (acpi_probe_mutex){....}, at: __acpi_probe_device_table+0xac/0x1ec
> [ 0.000000]
> [ 0.000000] but task is already holding lock:
> [ 0.000000] (____ptrval____) (cpu_hotplug_lock.rw_sem){....}, at: time_init+0x44/0xa0
> [ 0.000000]
> [ 0.000000] which lock already depends on the new lock.
> [ 0.000000]
> [ 0.000000]
> [ 0.000000] the existing dependency chain (in reverse order) is:
> [ 0.000000]
> [ 0.000000] -> #1 (cpu_hotplug_lock.rw_sem){....}:
> [ 0.000000] __lock_acquire+0x3cc/0x858
> [ 0.000000] lock_acquire+0x124/0x330
> [ 0.000000] cpus_read_lock+0x6c/0x100
> [ 0.000000] __cpuhp_setup_state+0x38/0x78
> [ 0.000000] gic_init_bases+0x3ac/0x5d8
> [ 0.000000] gic_acpi_init+0x2cc/0x564
> [ 0.000000] acpi_match_madt+0x9c/0x15c
> [ 0.000000] acpi_table_parse_entries_array+0x3e0/0x5d8
> [ 0.000000] acpi_table_parse_entries+0xbc/0x114
> [ 0.000000] acpi_table_parse_madt+0x4c/0x80
> [ 0.000000] __acpi_probe_device_table+0x134/0x1ec
> [ 0.000000] irqchip_init+0x48/0x74
> [ 0.000000] init_IRQ+0xe4/0x12c
> [ 0.000000] start_kernel+0x4d0/0x7d4
> [ 0.000000]
> [ 0.000000] -> #0 (acpi_probe_mutex){....}:
> [ 0.000000] validate_chain.isra.19+0xcd8/0x1158
> [ 0.000000] __lock_acquire+0x3cc/0x858
> [ 0.000000] lock_acquire+0x124/0x330
> [ 0.000000] __mutex_lock+0x110/0xa68
> [ 0.000000] mutex_lock_nested+0x3c/0x50
> [ 0.000000] __acpi_probe_device_table+0xac/0x1ec
> [ 0.000000] timer_probe+0x1bc/0x254
> [ 0.000000] time_init+0x48/0xa0
> [ 0.000000] start_kernel+0x4ec/0x7d4
> [ 0.000000]
> [ 0.000000] other info that might help us debug this:
> [ 0.000000]
> [ 0.000000] Possible unsafe locking scenario:
> [ 0.000000]
> [ 0.000000] CPU0 CPU1
> [ 0.000000] ---- ----
> [ 0.000000] lock(cpu_hotplug_lock.rw_sem);
> [ 0.000000] lock(acpi_probe_mutex);
> [ 0.000000] lock(cpu_hotplug_lock.rw_sem);
> [ 0.000000] lock(acpi_probe_mutex);
> [ 0.000000]
> [ 0.000000] *** DEADLOCK ***
> [ 0.000000]
> [ 0.000000] 1 lock held by swapper/0/0:
> [ 0.000000] #0: (____ptrval____) (cpu_hotplug_lock.rw_sem){....}, at: time_init+0x44/0xa0
> [ 0.000000]
> [ 0.000000] stack backtrace:
> [ 0.000000] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G T 4.20.0-rc1+ #10
> [ 0.000000] Call trace:
> [ 0.000000] dump_backtrace+0x0/0x248
> [ 0.000000] show_stack+0x24/0x30
> [ 0.000000] dump_stack+0xb8/0xf4
> [ 0.000000] print_circular_bug.isra.15+0x240/0x368
> [ 0.000000] check_prev_add.constprop.24+0x444/0xa38
> [ 0.000000] validate_chain.isra.19+0xcd8/0x1158
> [ 0.000000] __lock_acquire+0x3cc/0x858
> [ 0.000000] lock_acquire+0x124/0x330
> [ 0.000000] __mutex_lock+0x110/0xa68
> [ 0.000000] mutex_lock_nested+0x3c/0x50
> [ 0.000000] __acpi_probe_device_table+0xac/0x1ec
> [ 0.000000] timer_probe+0x1bc/0x254
> [ 0.000000] time_init+0x48/0xa0
> [ 0.000000] start_kernel+0x4ec/0x7d4
> [ 0.000000] arch_timer: Enabling global workaround for HiSilicon erratum 161010101
> [ 0.000000] arch_timer: CPU0: Trapping CNTVCT access
> [ 0.000000] arch_timer: cp15 timer(s) running at 50.00MHz (phys).
> [ 0.000000] clocksource: arch_sys_counter: mask: 0xffffffffffffff max_cycles: 0xb8812736b, max_idle_ns: 440795202655 ns
> [ 0.000002] sched_clock: 56 bits at 50MHz, resolution 20ns, wraps every 4398046511100ns