Re: [PATCH] staging: wfx: add gcc extension __force cast

From: Jerome Pouiller
Date: Tue Nov 12 2019 - 06:32:18 EST


Hello Al,

Thank you for your extensive review.

On Monday 11 November 2019 21:28:52 CET Al Viro wrote:
> On Mon, Nov 11, 2019 at 01:51:33PM +0000, Jules Irenge wrote:
> >
> > > NAK. force-cast (and it's not a gcc extension, BTW - it's sparse) is basically
> > > "I know better; the code is right, so STFU already". *IF* counters.count_...
> > > is really little-endian 32bit, then why isn't it declared that way? And if
> > > it's host-endian, you've just papered over a real bug here.
> > >
> > > As a general rule "fix" doesn't mean "tell it to shut up"...
> > >
> >
> > Thanks for the comments, I have updated but I have a mixed mind on the
> > __le32. I have to read more about it.
> >
> > I would appreciate if you can comment again on the update.
>
> From the look at the driver, it seems that all these counters are a part of
> structure that is read from the hardware and only used as little-endian.
> So just declare all fields in struct hif_mib_extended_count_table as
> __le32; easy enough. Looking a bit further, the same goes for
> struct hif_mib_bcn_filter_table ->num_of_info_elmts
> struct hif_mib_keep_alive_period ->keep_alive_period (__le16)
> struct hif_mib_template_frame ->frame_length (__le16)
> struct hif_mib_set_association_mode ->basic_rate_set (__le32)
> struct hif_req_update_ie ->num_i_es (__le16)
> struct hif_req_write_mib ->mib_id, ->length (__le16 both)
> struct hif_req_read_mib ->mib_id (__le16)
> struct hif_req_configuration ->length (__le16)

Indeed, structs declared in hif_api* are shared with the hardware and
should use __le16/__le32. However, as you noticed below, these structs
are sometime used in other parts of the code that are not related to
the hardware.

I have in my local queue a set of patches that improve the situation.
Objective is to limit usage of hif structs to hif_tx.c, hif_tx_mib.c
and hif_rx.c (which are correct places to handle hardware
communication). I hope to be able to submit these patches in 2 weeks.


[...]
> and that's where the real bugs start to show up; leaving the misbegotten
> forest of macros in misbegotten tracing shite aside, we have this:
>
> static const struct ieee80211_supported_band wfx_band_2ghz = {
> .channels = wfx_2ghz_chantable,
> .n_channels = ARRAY_SIZE(wfx_2ghz_chantable),
> .bitrates = wfx_rates,
> .n_bitrates = ARRAY_SIZE(wfx_rates),
> .ht_cap = {
> // Receive caps
> .cap = IEEE80211_HT_CAP_GRN_FLD | IEEE80211_HT_CAP_SGI_20 |
> IEEE80211_HT_CAP_MAX_AMSDU | (1 << IEEE80211_HT_CAP_RX_STBC_SHIFT),
> .ht_supported = 1,
> .ampdu_factor = IEEE80211_HT_MAX_AMPDU_16K,
> .ampdu_density = IEEE80211_HT_MPDU_DENSITY_NONE,
> .mcs = {
> .rx_mask = { 0xFF }, // MCS0 to MCS7
> .rx_highest = 65,
> drivers/staging/wfx/main.c:108:39: refering to this initializer.
> Sparse say that it expects rx_highest to be __le16. And that's
> not a driver-specific structure; it's generic ieee80211 one. Which
> says
> struct ieee80211_mcs_info {
> u8 rx_mask[IEEE80211_HT_MCS_MASK_LEN];
> __le16 rx_highest;
> u8 tx_params;
> u8 reserved[3];
> } __packed;
> and grepping for rx_highest through the tree shows that everything else
> is treating it as little-endian 16bit.
>
> Almost certainly a bug on big-endian hosts; should be .rx_highest = cpu_to_le16(65),
> instead.

Agree.


> Looking for more low-hanging fruits, we have
> static int indirect_read32_locked(struct wfx_dev *wdev, int reg, u32 addr, u32 *val)
> {
> int ret;
> __le32 *tmp = kmalloc(sizeof(u32), GFP_KERNEL);
>
> if (!tmp)
> return -ENOMEM;
> wdev->hwbus_ops->lock(wdev->hwbus_priv);
> ret = indirect_read(wdev, reg, addr, tmp, sizeof(u32));
> *val = cpu_to_le32(*tmp);
> _trace_io_ind_read32(reg, addr, *val);
> wdev->hwbus_ops->unlock(wdev->hwbus_priv);
> kfree(tmp);
> return ret;
> }
> with warnings about val = cpu_to_le32(*tmp); fair enough, since *val is
> host-endian (u32) and *tmp - little-endian. Trivial misannotation -
> it should've been le32_to_cpu(), not cpu_to_le32(). Same mapping on
> all CPUs we are ever likely to support, so it's just a misannotation,
> not a bug per se.

Agree.


> drivers/staging/wfx/hif_tx_mib.h:34:38: warning: incorrect type in initializer (different base types)
> drivers/staging/wfx/hif_tx_mib.h:34:38: expected unsigned char [usertype] wakeup_period_max
> drivers/staging/wfx/hif_tx_mib.h:34:38: got restricted __le16 [usertype]
>
> is about
> static inline int hif_set_beacon_wakeup_period(struct wfx_vif *wvif,
> unsigned int dtim_interval,
> unsigned int listen_interval)
> {
> struct hif_mib_beacon_wake_up_period val = {
> .wakeup_period_min = dtim_interval,
> .receive_dtim = 0,
> .wakeup_period_max = cpu_to_le16(listen_interval),
> };
> and struct hif_mib_beacon_wake_up_period has wakeup_period_max declared
> as uint8_t. We are shoving a le16 value into it. Almost certain bug -
> that will result in the listen_interval % 256 on litte-endian host and
> listen_interval / 256 on big-endian one. Looking at the callers to
> see what's actually passed as listen_interval shows only
> hif_set_beacon_wakeup_period(wvif, wvif->dtim_period, wvif->dtim_period);
> and dtim_period in *wvif (struct wfx_vif) can be assigned 0, 1 or
> values coming from struct ieee80211_tim_ie ->dtim_period or
> struct ieee80211_bss_conf ->dtim_period, 8bit in either structure.
>
> In other words, the value stored in val.wakeup_period_max will be
> always zero on big-endian hosts. Definitely bogus, should just
> store that (8bit) value as-is; cpu_to_le16() is wrong here.

Absolutely agree.


> Next piece of fun:
> static inline int hif_beacon_filter_control(struct wfx_vif *wvif,
> int enable, int beacon_count)
> {
> struct hif_mib_bcn_filter_enable arg = {
> .enable = cpu_to_le32(enable),
> .bcn_count = cpu_to_le32(beacon_count),
> };
> return hif_write_mib(wvif->wdev, wvif->id,
> HIF_MIB_ID_BEACON_FILTER_ENABLE, &arg, sizeof(arg));
> }
> Sounds like ->enable and ->bcn_count should both be __le32, which makes
> sense since the structs passed to hardware appear to be fixed-endian on
> that thing. However, annotating them as such adds warnigns:
> drivers/staging/wfx/sta.c:246:35: warning: incorrect type in assignment (different base types)
> drivers/staging/wfx/sta.c:246:35: expected restricted __le32 [assigned] [usertype] bcn_count
> drivers/staging/wfx/sta.c:246:35: got int
> drivers/staging/wfx/sta.c:249:32: warning: incorrect type in assignment (different base types)
> drivers/staging/wfx/sta.c:249:32: expected restricted __le32 [assigned] [usertype] enable
> drivers/staging/wfx/sta.c:249:32: got int
> drivers/staging/wfx/sta.c:253:32: warning: incorrect type in assignment (different base types)
> drivers/staging/wfx/sta.c:253:32: expected restricted __le32 [assigned] [usertype] enable
> drivers/staging/wfx/sta.c:253:32: got int
> drivers/staging/wfx/sta.c:262:62: warning: incorrect type in argument 2 (different base types)
> drivers/staging/wfx/sta.c:262:62: expected int enable
> drivers/staging/wfx/sta.c:262:62: got restricted __le32 [assigned] [usertype] enable
> drivers/staging/wfx/sta.c:262:78: warning: incorrect type in argument 3 (different base types)
> drivers/staging/wfx/sta.c:262:78: expected int beacon_count
> drivers/staging/wfx/sta.c:262:78: got restricted __le32 [assigned] [usertype] bcn_count
>
> All in the same function (wfx_update_filtering()) and we really do store
> host-endian values in those (first 3 places). In the last one we pass
> them to hif_beacon_filter_control(), which does expect host-endian.
> And that's the only thing we do to the instance of hif_mib_bcn_filter_enable
> in there...
>
> Possible solutions:
> 1) store them little-endian there, pass to hif_beacon_filter_control()
> already l-e, get rid of cpu_to_le32() in the latter.
> 2) store them little-endian, pass the entire pointer to struct
> instead of forming it again in hif_beacon_filter_control()
> 3) don't pretend that the objects in hif_beacon_filter_control()
> and in wfx_update_filtering() are of the same type (different layouts on
> big-endian) and replace the one in the caller with two local variables.
> My preference would be (3), as in
[...]
> but that's a matter of taste.

Yes, this is one of the difficult parts. I work on it (I opted for
solution 3).



> Next is bx.c warning about __le32; that's about num_tx_count being fed to cpu_to_le32().
> grepping for that thing results in
> drivers/staging/wfx/bh.c:106: release_count = le32_to_cpu(((struct hif_cnf_multi_transmit *)hif->body)->num_tx_confs);
> drivers/staging/wfx/hif_api_cmd.h:316: uint32_t num_tx_confs;
> drivers/staging/wfx/hif_rx.c:78: int count = body->num_tx_confs;
> which is troubling - the first line (in rx_helper()) expects to find
> a little-endian value in that field, while the last (in hif_multi_tx_confirm())
> - a host-endian, with nothing in sight that might account for conversion
> from one to another.
>
> Let's look at the call chains: hif_multi_tx_confirm() is called only as
> hif_handlers[...]->handler(), which happens in in wfx_handle_rx().
> The call is
> hif_handlers[i].handler(wdev, hif, hif->body);
> and hif has come from
> struct hif_msg *hif = (struct hif_msg *) skb->data;
> wfx_handle_rx() is called by the same rx_helper()... skb is created by
> rx_helper() and apparently filled by the call
> if (wfx_data_read(wdev, skb->data, alloc_len))
> goto err;
> right next to the allocation... and prior to the
> release_count = le32_to_cpu(((struct hif_cnf_multi_transmit *)hif->body)->num_tx_confs);
> where we expect little-endian, with nothing to modify the skb contents
> between that and the call of wfx_handle_rx(). hif in rx_helper() points
> to the same place - skb->data. OK, we almost certainly have a bug here.
>
> That thing allocates a packet and fills it with incoming data. Then
> it parses the damn thing, apparently treating the same field of the
> incoming as little-endian in one place and host-endian in another.
> In principle it's possible that the rest of the packet determines
> which one it is, but by the look of that code both places are
> hit if and only if hif->id is equal to HIF_CNF_ID_MULTI_TRANSMIT.
> It *can't* be correct on big-endian. Not even theoretically.
>
> And since it's over-the-wire data, I would expect it to be fixed-endian.
> That needs to be confirmed with the driver's authors and/or direct
> experiment on big-endian host, but I strongly suspect that the right
> fix is to have
> int count = le32_to_cpu(body->num_tx_confs);
> in hif_multi_tx_confirm() (and num_tx_confs annotated as __le32).

Indeed, num_tx_confs is always a le32 value.


> HOWEVER, that opens another nasty can of worms. We have
> struct hif_cnf_tx *buf_loc = (struct hif_cnf_tx *) &body->tx_conf_payload;
> ...
> for (i = 0; i < count; ++i) {
> wfx_tx_confirm_cb(wvif, buf_loc);
> buf_loc++;
> }
> with count derived from the packet and body pointing into the packet. And no
> visible checks that would make sure the loop won't run out of the data we'd
> actually got.
>
> The check in rx_helper() verifies that hif->len matches the amount we'd
> received; the check for ->num_tx_confs in there doesn't look like what
> we'd needed (that would be offset of body.tx_conf_payload in packet +
> num_tx_confs * sizeof(struct hif_cnf_multi_transmit) compared to
> actual size).
>
> So it smells like a remote buffer overrun, little-endian or not.
> And at that point I would start looking for driver original authors with
> some rather pointed questions about the validation of data and lack
> thereof.

There are not so much checks done on data retrieved from the hardware.
I think we can find other similar issues in the driver.

In this particular case, indeed, a little check on length of received
data could be a good idea.


> BTW, if incoming packets are fixed-endian, I would expect more bugs on
> big-endian hosts - wfx_tx_confirm_cb() does things like
> tx_info->status.tx_time =
> arg->media_delay - arg->tx_queue_delay;
> with media_delay and tx_queue_delay both being 32bit fields in the
> incoming packet. So another question to the authors (or documentation,
> or direct experiments) is what endianness do various fields in the incoming
> data have. We can try and guess, but...

Fortunately, answer is simple enough: everything from hardware is
little endian :).

Jules, do you want to take care of fixing theses issues (except the one
about wfx_update_filtering())?


--
Jérôme Pouiller