mac80211 No ProbeResp drives me bonkers

From: barry bouwsma
Date: Fri Jul 04 2008 - 11:10:42 EST


Moin moin!

The newer mac80211 code has caused me grief in my attempts to use
post-2.6.24 kernels, accessing a somewhat remote AP, for which the
old softmac code (plus at least one hack) worked ``reliably'' or
``well'', or ``kinda worked'' and ``maybe''.

The most annoying problem is that the following section of code is
triggered far too often -- anywhere from hourly intervals to less
than a minute, depending on something -- what, I don't know.

Code, with hack that has let me remain online overnight, even during
a bit of what must suffice for ``sleep'':
as last seen in net/mac80211/mlme.c ...


992 if (time_after(jiffies,
993 sta->last_rx + IEEE80211_MONITORING_INTER 993 VAL)) {
994 if (ifsta->flags & IEEE80211_STA_PROBEREQ_POLL)
994 {
995 printk(KERN_DEBUG "%s: No ProbeResp from 995 "
996 "current AP %s - assume out of "
997 "range\n",
998 dev->name, print_mac(mac, ifsta-> 998 bssid));
999 #if 0 /* XXX AAAGH this seems to kick me off too much, KILL */
1000 disassoc = 1;
1001 sta_info_unlink(&sta);
1002 } else
1003 #endif /* XXX cause of high blood pressure */
1004 } /* XXX HACK */
1005 ieee80211_send_probe_req(dev, ifsta->bss 1005 id,
1006 local->scan_ssi 1006 d,


This from a recent (week-old-ish) 2.6.26-rc8 kernel.

I've added no further debuggery to see exactly what's going on.
A hypothesis I have is that this sometimes-weak (but not since I
added the above XXX-pr0n hack, all hail Murphy, too bad I didn't
*need* uninterrupted 'net access during this time) AP sends some
data that gets lost due to one or more of the following:

* rapid fluctuations in signal strength, related to the fact that
at times the Beacons sent are sometimes not receivable for hours

* the fact this is not an isolated network, with many APs sending
Beacons on this channel, as well as even more operating on nearby
channels (overlap), so collisions are inevitable. Since I was
repeatedly kicked off at intervals of a minute or so when I was
trying to take care of ``important'' ``work'' one afternoon, I
suspect that was due to corrupted packets. Maybe other ``people''
were online and interfering with me packets. I could reassociate
immediately at signal levels which qualify as ``never seen better''
from that AP, so it wasn't signal fade.

* something else blindingly obvious that I'm not seeing.


I'm guessing there's no retries in the code (haven't worked up the
courage and awakeness to actually check) or that the retries are in
too-short a time interval to make a difference -- which is another
Issue I have with mac80211 compared with softmac with this particular
AP (more later about that, maybe)


With the above hack, I've been online for longer than would be
possible without, as I've logged the times I would have been kicked
off. My fear, apparently unfounded, was that I'd be syslog-bombed
by these messages, but so far they appear occasionally, yet enough
to be annoying because they forcibly threw me off. Still waiting to
see how it handles signal dropping below the point of communication.


[193012.201642] wlan1: RX deauthentication from 00:...
[normal reassociation...]
[193013.206729] wlan1: associated
[208709.210335] wlan1: No ProbeResp from current AP 00:[CENSORED] - assume out of range
[210407.210307] wlan1: No ProbeResp from current AP 00:[CENSORED] - assume out of range
[214612.381539] wlan1: RX deauthentication from 00:...
[214613.386331] wlan1: associated
[215812.391525] wlan1: RX deauthentication from 00:...
[215813.398349] wlan1: associated
[217012.401289] wlan1: RX deauthentication from 00:...
[217013.409378] wlan1: associated
[218212.412629] wlan1: RX deauthentication from 00:...
[218213.419281] wlan1: associated
[219089.410133] wlan1: No ProbeResp from current AP 00:[CENSORED] - assume out of range
[219141.410157] wlan1: No ProbeResp from current AP 00:[CENSORED] - assume out of range
[219153.410152] wlan1: No ProbeResp from current AP 00:[CENSORED] - assume out of range
[219229.410140] wlan1: No ProbeResp from current AP 00:[CENSORED] - assume out of range
[219412.421206] wlan1: RX deauthentication from 00:...
[219413.427770] wlan1: associated
[219427.420189] wlan1: No ProbeResp from current AP 00:[CENSORED] - assume out of range
[219445.420157] wlan1: No ProbeResp from current AP 00:[CENSORED] - assume out of range
[219531.420166] wlan1: No ProbeResp from current AP 00:[CENSORED] - assume out of range
[219621.420130] wlan1: No ProbeResp from current AP 00:[CENSORED] - assume out of range
[219631.420146] wlan1: No ProbeResp from current AP 00:[CENSORED] - assume out of range
[219991.420157] wlan1: No ProbeResp from current AP 00:[CENSORED] - assume out of range
[221212.437209] wlan1: RX deauthentication from 00:...
[221213.435806] wlan1: associated
[222412.448670] wlan1: RX deauthentication from 00:...
[222413.447710] wlan1: associated
[223612.458562] wlan1: RX deauthentication from 00:...
[223613.457670] wlan1: associated
[224812.468496] wlan1: RX deauthentication from 00:...
[224813.471239] wlan1: associated
[226012.478078] wlan1: RX deauthentication from 00:...
[226013.480571] wlan1: associated
[227212.487556] wlan1: RX deauthentication from 00:...
[227213.489715] wlan1: associated
[228412.497499] wlan1: RX deauthentication from 00:...
[228413.496655] wlan1: associated
[229101.490133] wlan1: No ProbeResp from current AP 00:[CENSORED] - assume out of range
[229165.490136] wlan1: No ProbeResp from current AP 00:[CENSORED] - assume out of range
[229191.490159] wlan1: No ProbeResp from current AP 00:[CENSORED] - assume out of range
[229612.508539] wlan1: RX deauthentication from 00:...
[229613.506668] wlan1: associated
[232289.500159] wlan1: No ProbeResp from current AP 00:[CENSORED] - assume out of range
[238241.500151] wlan1: No ProbeResp from current AP 00:[CENSORED] - assume out of range
[238413.500163] wlan1: No ProbeResp from current AP 00:[CENSORED] - assume out of range

That brings us up-to-date and I'm still online and I've needed to
do nothing to remain that way.

In the worst case so far, this occurred twice within ten seconds.
There's no obvious predictability as to when the ProbeResp fails.

Ignoring it, as I've done, is not the right way to proceed, but
unless I either read the code or add further debugging, I can't
say whether a few retries would help me, or if I should simply
await the inevitable timeouts which I assume happen later.


This probably isn't a problem with a strong local net, or perhaps
an isolated net without other potentially-interfering APs, but I
know too little about wireless networking protocols and modulation
techniques to speak authoritatively.


*update*
Ah, now we have a weak signal: seems to be 4 to 6 second intervals
between syslog messages, and again I'm back in business... aaaand,
I'm out of range, but still see signal levels -- no syslog-bombing
as feared; signal unusable, syslog has stopped, awaiting usual kernel
panic, nothing yet, ...

Yay! Third complaint about mac80211 is identified (an ``unusable''
signal would disappear from /proc/net/wireless so I was no longer
able to monitor it easily to determine when it was safe to attempt
to reassociate and continue...)

And in spite of all this, I've automagically reassociated and
can ping the AP, even after tens of minutes of weak signal


I conclude that the above code might be useful or necessary for
wardriving, where a fading AP might be replaced by something better,
but for war-settin'-on-me-arse it introduces obstacles; even for
war-I-have-an-AP-and-I-ain't-afraid-to-use-it there may well be
problems through reinforced concrete walls or in large cities/
housing concentrations with lotsa WLANs all on the same channel.


[much time passes]

And some hours later, I've lost my association but nevertheless the
signal quality remains available through /proc/net/wireless, which
hints that I can successfully reassociate, and indeed that is the
case, after a few hours of sleep and heavy rain. Joy oh joy.

Further dmesg scraps:
[252413.906181] wlan1: associated
[252437.900171] wlan1: No ProbeResp from current AP 00:[CENSORED] - assume out of range
[252473.900150] wlan1: No ProbeResp from current AP 00:[CENSORED] - assume out of range
[252489.900164] wlan1: No ProbeResp from current AP 00:[CENSORED] - assume out of range
[252507.900156] wlan1: No ProbeResp from current AP 00:[CENSORED] - assume out of range
[252767.900142] wlan1: No ProbeResp from current AP 00:[CENSORED] - assume out of range
[252771.900155] wlan1: No ProbeResp from current AP 00:[CENSORED] - assume out of range
[...]
[253999.900163] wlan1: No ProbeResp from current AP 00:[CENSORED] - assume out of range
[254003.900221] wlan1: No ProbeResp from current AP 00:[CENSORED] - assume out of range
[254007.900151] wlan1: No ProbeResp from current AP 00:[CENSORED] - assume out of range
[254011.900322] wlan1: No ProbeResp from current AP 00:[CENSORED] - assume out of range
[254015.900224] wlan1: No ProbeResp from current AP 00:[CENSORED] - assume out of range
[254019.900163] wlan1: No ProbeResp from current AP 00:[CENSORED] - assume out of range
[254023.900146] wlan1: No ProbeResp from current AP 00:[CENSORED] - assume out of range
[time passes, I sleep, with open window, kitchen flooded, etc. etc.]
[I wake up, ping fails, I manually reassociate to do further ``work''...]
[266211.365705] wlan1: Initial auth_alg=0
[266211.365705] wlan1: authenticate with AP 00:...
[266211.370105] wlan1: associated
[further messages as I'm writing this:]
[268233.380152] wlan1: No ProbeResp from current AP 00:[CENSORED] - assume out of range
[268239.380144] wlan1: No ProbeResp from current AP 00:[CENSORED] - assume out of range
[268443.380153] wlan1: No ProbeResp from current AP 00:[CENSORED] - assume out of range
[268525.380158] wlan1: No ProbeResp from current AP 00:[CENSORED] - assume out of range

Connection still works and all so far. Draw your own conclusions.

At worst this message is logged every four seconds, the time when I'm
asleep and presumably the remote AP is indeed unreachable. After this,
some sort of natural decay takes place and the connection becomes stale,
yet with my hack, I'm still able to readily monitor signal quality.

Had I been wanting to download pr0n^W^W work this last hour since
waking (not a typo), I would have been kicked off some six times
and I might have decided to mop up the kitchen instead.


thanks,
barry bouwsma




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/