page fault problems porting a network driver to 2.4.x

From: Hen, Shmulik (shmulik.hen@intel.com)
Date: Tue Oct 24 2000 - 13:21:23 EST


> Hello,
>
> We are developing an advanced networking services loadable module and are
> having problems porting it to work on 2.4.x kernels. The driver is
> supposed to provide services such as fault tolerance, load balancing and
> link aggregation over a team of network adapters. It works OK on 2.2.x
> kernels but hangs on 2.4.x kernels.
>
> In order to debug it, we stripped it down to become a mere "intermediate"
> or "filter" driver that binds to a base driver and passes everything
> through in both directions (Rx, Tx, IOCTL, stats, etc.). After going
> through the basics of modifying the driver to compile on 2.4.x kernels and
> fighting some nasty dead locks due to the new nature of the networking
> layer, we managed to get it to run. The driver will receive and transmit a
> few hundreds of thousands of packets (while having a periodic timer expire
> 10 times a second and running continuous IOCTLs), and then it causes an
> oops about not being able to handle a page fault.
>
> The function looks something like:
>
> int iansHardStartXmit(struct sk_buff *skb, struct net_device *dev) {
> int res;
> struct net_device *base;
>
> spin_lock(&lock);
> base = get_base_driver_by_name(name);
>
> if(base != NULL) {
> res = base->hard_start_xmit(skb, base);
> }
>
> spin_unlock(&lock);
> return res;
> }
>
> We used kdb in order to track down the problem and found out the following
> stack trace:
>
> EBP EIP function(args)
> 0xc4cd1c54 0xd081e3e7 [e100]__kallsyms+0xb (0xc4b595a0,
> 0xc840f200)
> e100 __kallsyms 0xd081e3dc
> 0xd081e3dc 0xd0820dsc
> 0xd08244ba [ians]iansHardStartXmit+0xa6 (0xc4b595a0,
> 0xc4d9bc00)
> ians .text 0xd0824060 0xd0824414
> 0xd082452c
> 0xc01f9d1f qdisc_restart+0xcf (0xc4d9bc00)
> kernel .text 0xc0100000 0xc01f9c50
> 0xc01f9f14
> *
> *
> *
>
> This goes on and shows that this is an ICMP echo reply packet going down
> through the IP stack to the filter driver (apparently 0xc4b595a0 is the
> skb, 0xc4d9bc00 is the *dev of the filter driver and 0xc840f200 is the
> *dev of the base driver). The filter driver is supposed to call the
> dev->hard_start_xmit of the base driver, but strangely it lands somewhere
> in the data segment of the base driver (__kallsyms is a part of the symbol
> table of the module according to insmod -m).
> Figuring the dev->hard_start_xmit pointer got trashed somehow, we added a
> check to make sure the same pointer is always called, and indeed this was
> the case. Looking at the assembly code with kdb, we could see that the
> call to the base driver is done by a 'call *%eax' command. kdb reports
> that eax=0xffffffff after the page fault (origeax).
>
> How is it possible that the pointer to the function keeps it's value, but
> the jump to that function falls somewhere else ?
> The entire function is protected by a spinlock, so there is no worry about
> the other threads messing my data.
>
> We are using:
> RedHat 6.2
> gcc v2.91.66
> modutils v2.3.11-1
> kernel linux-2.4.0-test9
> kdb v1.5-2.4.0-test9-pre9
> Compaq ap500 dual p-III Xeon
>
>
> Thanks,
> Shmulik Hen
>
> Software Engineer
> Linux Advanced Networking Services
> Network Communications Group, Israel (NCGj)
> Intel Corporation Ltd.
>
>

-
To unsubscribe from this list: send the line "unsubscribe linux-net" in
the body of a message to majordomo@vger.kernel.org



This archive was generated by hypermail 2b29 : Tue Oct 31 2000 - 21:00:33 EST