Re: [PATCHv2] wifi: ath11k: mark reset srng lists as uninitialized

From: Baochen Qiang
Date: Thu Jun 12 2025 - 03:51:42 EST




On 6/12/2025 3:02 PM, Sergey Senozhatsky wrote:
> On (25/06/12 13:47), Baochen Qiang wrote:
>>> [..]
>>>>> diff --git a/drivers/net/wireless/ath/ath11k/hal.c b/drivers/net/wireless/ath/ath11k/hal.c
>>>>> index 8cb1505a5a0c..cab11a35f911 100644
>>>>> --- a/drivers/net/wireless/ath/ath11k/hal.c
>>>>> +++ b/drivers/net/wireless/ath/ath11k/hal.c
>>>>> @@ -1346,6 +1346,10 @@ EXPORT_SYMBOL(ath11k_hal_srng_init);
>>>>> void ath11k_hal_srng_deinit(struct ath11k_base *ab)
>>>>> {
>>>>> struct ath11k_hal *hal = &ab->hal;
>>>>> + int i;
>>>>> +
>>>>> + for (i = 0; i < HAL_SRNG_RING_ID_MAX; i++)
>>>>> + ab->hal.srng_list[i].initialized = 0;
>>>>
>>>> With this flag reset, srng stats would not be dumped in ath11k_hal_dump_srng_stats().
>>>
>>> I think un-initialized lists should not be dumped.
>>>
>>> ath11k_hal_srng_deinit() releases wrp.vaddr and rdp.vaddr, which are
>>> accessed, as far as I understand it, in ath11k_hal_dump_srng_stats()
>>> as *srng->u.src_ring.tp_addr and *srng->u.dst_ring.hp_addr, presumably,
>>> causing things like:
>>
>> But ath11k_hal_dump_srng_stats() is called before ath11k_hal_srng_deinit(), right?
>>
>> The sequence is ath11k_hal_dump_srng_stats() is called in reset process, then restart_work
>> is queued and in ath11k_core_restart() we call ath11k_core_reconfigure_on_crash(), there
>> ath11k_hal_srng_deinit() is called, right?
>
> My understanding is that the driver first fails to reconfigure
>
> <4>[163874.555825] ath11k_pci 0000:01:00.0: already resetting count 2
> <4>[163884.606490] ath11k_pci 0000:01:00.0: failed to wait wlan mode request (mode 4): -110
> <4>[163884.606508] ath11k_pci 0000:01:00.0: qmi failed to send wlan mode off: -110
> <3>[163884.606550] ath11k_pci 0000:01:00.0: failed to reconfigure driver on crash recovery
>
> so ath11k_core_reconfigure_on_crash() calls ath11k_hal_srng_deinit(),
> which destroys the srng lists, but leaves the stale initialized flag.
> So next time ath11k_hal_dump_srng_stats() is called everything looks ok,
> but in fact everything is not quite ok.

OK, we have a second crash while the first crash is still in recovering. And guess the
first recovery fails such that srng is not reinitialized. Then after a
wait-for-first-recovery time out, the second recovery starts, this results in
ath11k_hal_dump_srng_stats() getting called and hence the kernel crash.

Could you please share complete verbose kernel log? you may enable it with

modprobe ath11k debug_mask=0xffffffff
modprobe ath11k_pci

>
> Regardless of that, I do think that resetting the initialized flag
> when srng list is de-initialized/destroyed is the right thing to do.

Yeah, correct.