Re: Please consider reverting 7d930bc33653d5592dc386a76a38f39c2e962344
From: Luis R. Rodriguez
Date: Tue Nov 03 2009 - 12:24:36 EST
On Tue, Nov 3, 2009 at 8:29 AM, Ingo Molnar <mingo@xxxxxxx> wrote:
> * Marcel Holtmann <marcel@xxxxxxxxxxxx> wrote:
>> Hi Linus,
>> > > no questions that it needs fixed, I agree with you. However just blindly
>> > > reverting something, because it fixes it for one or two people, might
>> > > have side effects that causes more problems than the revert would
>> > > actually fix.
>> > Stop whining. Really.
>> > Everybody understands that it should be fixed. ÂThat's not the question.
>> > But it should be fixed _quickly_. In this case, I have a bisection report
>> > FROM TWO DAYS AGO. And I'm still kicking myself for not just reverting
>> > that piece-of-shit commit then, because I spent the time to look at the
>> > oops and the commit, and could tell that it was crap.
>> > Instead, I _did_ wait for the subsystem maintainer to get around to it. As
>> > a result of waiting, I've now wasted time for a lot of other people.
>> I do have a patch in my inbox from Johannes from 4 days ago that fixes
>> this issue.
>> Â Â Â http://marc.info/?l=linux-wireless&m=125697124819563&w=2
>> So what is the take away from this now? Do you wanna have Johannes
>> step over John and Dave and send such a patch directly to you?
> The problem as i see it is the kind of answer Johannes gave when the bug
> was bisected to by Jeff Chua two days ago:
> ÂSubject: wpa2 hangs v2.6.32-rc5-402-gb6727b1. Revert
> Â Â Â Â Â 7d930bc33653d5592dc386a76a38f39c2e962344 fixed it.
> Â[ <1257151742.3555.165.camel@xxxxxxxxxxxxxx> ]
> Â| On Sun, 2009-11-01 at 23:18 +0800, Jeff Chua wrote:
> Â| > wpa2 (wpa_supplicant) hangs v2.6.32-rc5-402-gb6727b1.
> Â| Explain?
> Â| > Reverting 7d930bc33653d5592dc386a76a38f39c2e962344 fixes it.
> Â| Certainly not a good idea, will break when your AP denies association.
It certainly would have helped to have elaborated more on this and to
be fair let me elaborate a little more on the issue which the patch in
question tried to address.
After the merge window we have been getting WARN_ON()s on an extremely
rare situation which was very difficult to debug. The issue can be
seen even on the kerneloops.org track list . This also applied to
people running older kernels but running our stable compat-wireless
for 2.6.32 . Only a few of us have been able to trigger this
warning and report it but it was never possible to easily reproduce.
It usually took a day's worth of connectivity to reproduce.
Late in the merge window, right before the kernel summit I found a way
to finally reproduce and sent a one line patch to fix this  but as
noted by Johannes it was not the proper fix for a few reasons.
The unfortunate side of the issue is that once you hit the WARN_ON()
it is already too late and will simply not be able to associate to any
AP and hence Johannes' comment. The WARN_ON() was just a secondary
warning of shit already having hit the fan elsewhere. The problem was
reproducible whenever an Access Point denied association and you
typically would not run into this unless you mis-configure your
wireless settings or you happen to have some AP in some funny
situations (I believe in my case it was -E_TOO_MANY_PEOPLE_CONNECTED).
What this means is that the issue at hand was serious and it did need
to be fixed one way or another for 2.6.32.
At the kernel summit during the hacking session I poked Johannes and
we reviewed the issue at hand and tried to come up with a proper
solution. The solution were the two sets of patches, one of which
caused a regression as noted by a few people.
Truth is an oops is worse than a WARN_ON() and also loosing the
ability to associate to your AP. Because of that perhaps this should
have been reverted but one way or another this issue needed to be
fixed for good for 2.6.32. So unfortunately reverting the patch would
re-introduce an issue for all users.
How this sort of issue is dealt with is subjective and it is up to
maintainers to deal with.
> Unhelpful, defensive, in denial.
> Plus that you tried to berate Dmitry in this particular thread about the
> revert was pretty bad form too IMO.
> _Anyone_ who went through the unnecessary, avoidable cost of having to
> do a bisection of a 3 days old commit merged at around -rc5 time is in
> his full rights to ask for a revert, straight from Linus if he thinks
> so. No ifs and when about it.
> So IMO you are showing the wrong kind of attitude for a post-rc5
> regression, by a _wide_ margin. The right kind of attitude would be:
> Â"Oops, my bad - thanks. I've queued up a revert."
> Â"Oops, my bad - thanks. Does the attached patch fix it?
> Â If not we'll revert it."
> Furthermore, your 'hey, nothing happened, we fixed it after all'
> argument is just a forewarning that you learned nothing and such
> avoidable incidents could repeat in the future.
Having more information on the patch and better communication about
the issue it solved, and the issues that reverting it would have
caused would certainly have helped maintainers make a better call at a
regression caused by it but knowing Johannes he'd probably cook up a
followup fix ASAP and that is exactly what he did.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/