Re: exit_mmap BUG_ON in 2.6.23 (and Add qdisc __NET_XMIT_STOLEN)

From: Sam Portolla
Date: Wed May 30 2012 - 20:35:47 EST






----- Original Message -----
From: Eric Dumazet <eric.dumazet@xxxxxxxxx>
To: Sam Portolla <samportolla@xxxxxxxxx>
Cc: Hugh Dickins <hughd@xxxxxxxxxx>; "kaber@xxxxxxxxx" <kaber@xxxxxxxxx>; "jarkao2@xxxxxxxxx" <jarkao2@xxxxxxxxx>; "davem@xxxxxxxxxxxxx" <davem@xxxxxxxxxxxxx>; "linux-kernel@xxxxxxxxxxxxxxx" <linux-kernel@xxxxxxxxxxxxxxx>
Sent: Friday, May 25, 2012 11:16 PM
Subject: Re: exit_mmap BUG_ON in 2.6.23 (and Add qdisc __NET_XMIT_STOLEN)

On Fri, 2012-05-25 at 22:27 -0700, Sam Portolla wrote:

> Yes, thanks I had looked at the kernel  code and know how transmit
> timeouts come to be in normal cases. The driver specifies a timeout
> period to the network layer, along with a callback function to call in
> case of Tx timeout so the driver can do error handling which is
> typically to reset the driver (and this happened in the case of the
> BNX2 linux driver our system uses as well). Above I had asked some
> specific questions with regards to whether a known bug w/ qdisc could
> stop the Tx Q's to the device and thereby cause traffic timeouts. Also
> it seems from the email thread on the patch I had mentioned above that
> the qdisc issue can cause memory corruption, which could then tie it
> in with the BUG_ON in exit_mmap() which Hugh had previously commented
> on. I am hoping the engineers who fixed the QDISC issue can comment on
> the former and Hugh can comment on the BUG_ON again. Regards.


The commit you mention is about a very unusual use of qdiscs.
I really doubt it is your problem.
Most advanced tc users probably wont stick with 2.6.23 kernels.

Please post :

tc -s -d qdisc

And for all your network devices :

for DEV in eth0 eth1 eth2
do
tc -s -d class show dev $DEV
done

Hi Eric,

Can you please elaborate on what you mean by the commit is "about a very unusual use of qdiscs"?
The lack of this fix was at the time determined to cause the ether driver to do a NULL ptr de-ref on its  Tx ring SKB, which is what we saw in our case as well.  The qdisc code apparently was changing the "nr_frags" field in the SKB, while the driver was the owner of the SKB, causing the issue.
 
Can't find the "tc" command mentioned above on our system. Tried from harddisk directory as well dev directory.
What is "tc" and could you please paste how you run it on your system? Also the 2.6.23  GNU/linux we use is not fully compatible with the previously mentioned qdisc commit. For example, there is no qdisc_enqueue() function in our baseline and ditto for some of the other code. So seems risky to backport this patch. If we go w/ backporting it, i can post my diffs here, and would really appreciate a review from you. Regards.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/