RE: [PATCH v3 0/8] rtw88: prepare locking for SDIO support

From: Pkshih
Date: Sun Jan 23 2022 - 22:00:47 EST


Hi,

> -----Original Message-----
> From: Martin Blumenstingl <martin.blumenstingl@xxxxxxxxxxxxxx>
> Sent: Monday, January 24, 2022 3:04 AM
> To: Pkshih <pkshih@xxxxxxxxxxx>
> Cc: linux-wireless@xxxxxxxxxxxxxxx; tony0620emma@xxxxxxxxx; kvalo@xxxxxxxxxxxxxx;
> johannes@xxxxxxxxxxxxxxxx; netdev@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; Neo Jou
> <neojou@xxxxxxxxx>; Jernej Skrabec <jernej.skrabec@xxxxxxxxx>; Ed Swierk <eswierk@xxxxx>
> Subject: Re: [PATCH v3 0/8] rtw88: prepare locking for SDIO support
>
> Hi Ping-Ke,
>
> On Fri, Jan 21, 2022 at 9:10 AM Pkshih <pkshih@xxxxxxxxxxx> wrote:
> [...]
> > >
> > > I do stressed test of connection and suspend, and it get stuck after about
> > > 4 hours but no useful messages. I will re-build my kernel and turn on lockdep debug
> > > to see if it can tell me what is wrong.
> First of all: thank you so much for testing this and investigating the deadlock!
>
> > I found some deadlock:
> >
> > [ 4891.169653] CPU0 CPU1
> > [ 4891.169732] ---- ----
> > [ 4891.169799] lock(&rtwdev->mutex);
> > [ 4891.169874] lock(&local->sta_mtx);
> > [ 4891.169948] lock(&rtwdev->mutex);
> > [ 4891.170050] lock(&local->sta_mtx);
> >
> >
> > [ 4919.598630] CPU0 CPU1
> > [ 4919.598715] ---- ----
> > [ 4919.598779] lock(&local->iflist_mtx);
> > [ 4919.598900] lock(&rtwdev->mutex);
> > [ 4919.598995] lock(&local->iflist_mtx);
> > [ 4919.599092] lock(&rtwdev->mutex);
> This looks similar to the problem fixed by 5b0efb4d670c8b ("rtw88:
> avoid circular locking between local->iflist_mtx and rtwdev->mutex")
> which you have pointed out earlier.
> It seems to me that we should avoid using the mutex version of
> ieee80211_iterate_*() because it can lead to more of these issues. So
> from my point of view the general idea of the code from your attached
> patch looks good. That said, I'm still very new to mac80211/cfg80211
> so I'm also interested in other's opinions.
>

The attached patch can work "mostly", because both callers of iterate() and
::remove_interface hold rtwdev->mutex. Theoretically, the exception is a caller
forks another work to iterate() between leaving ::remove_interface and mac80211
doesn't yet free the vif, but the work executes after mac80211 free the vif.
This will lead use-after-free, but I'm not sure if this scenario will happen.
I need time to dig this, or you can help to do this.

To avoid this, we can add a flag to struct rtw_vif, and set this flag
when ::remove_interface. Then, only collect vif without this flag into list
when we use iterate_actiom().

As well as ieee80211_sta can do similar fix.

> > So, I add wrappers to iterate rtw_iterate_stas() and rtw_iterate_vifs() that
> > use _atomic version to collect sta and vif, and use list_for_each() to iterate.
> > Reference code is attached, and I'm still thinking if we can have better method.
> With "better method" do you mean something like in patch #2 from this
> series (using unsigned int num_si and struct rtw_sta_info
> *si[RTW_MAX_MAC_ID_NUM] inside the iter_data) are you thinking of a
> better way in general?
>

I would like a straight method, for example, we can have another version of
ieee80211_iterate_xxx() and do things in iterator, like original, so we just
need to change the code slightly.

Initially, I have an idea we can hold driver lock, like rtwdev->mutex, in both
places where we use ieee80211_iterate_() and remove sta or vif. Hopefully,
this can ensure it's safe to run iterator without other locks. Then, we can
define another ieee80211_iterate_() version with a drv_lock argument, like

#define ieee80211_iterate_active_interfaces_drv_lock(hw, iter_flags, iterator, data, drv_lock) \
while (0) { \
lockdep_assert_wiphy(drv_lock); \
ieee80211_iterate_active_interfaces_no_lock(hw, iter_flags, iterator, data); \
}

The driv_lock argument can avoid user forgetting to hold a lock, and we need
a helper of no_lock version:

void ieee80211_iterate_active_interfaces_no_lock(
struct ieee80211_hw *hw, u32 iter_flags,
void (*iterator)(void *data, u8 *mac,
struct ieee80211_vif *vif),
void *data)
{
struct ieee80211_local *local = hw_to_local(hw);

__iterate_interfaces(local, iter_flags | IEEE80211_IFACE_ITER_ACTIVE,
iterator, data);
}

However, as I mentioned theoretically it is not safe entirely.

So, I think the easiest way is to maintains the vif/sta lists in driver when
::{add,remove }_interface/::sta_{add,remove}, and hold rtwdev->mutex lock to
access these lists. But, Johannes pointed out this is not a good idea [1].

[1] https://lore.kernel.org/linux-wireless/d61f3947cddec660cbb2a59e2424d2bd8c01346a.camel@xxxxxxxxxxxxxxxx/
--
Ping-Ke