net_bh limitation

From: Aaron Campbell (aaron@cs.dal.ca)
Date: Thu Jun 08 2000 - 11:47:24 EST


I have run into a problem with a loadable kernel module I've written. It
is a packet pre-processor that is intended to analyze packets before they
hit the ip_rcv() function. To illustrate the limitation of net_bh() I'm
faced with, consider the following piece of code (Version 2.2.14 sources,
somewhat reformatted to be easily to read here):

struct packet_type *ptype;
struct packet_type *pt_prev;

[...]

pt_prev = NULL;
for (ptype = ptype_all; ptype!=NULL; ptype=ptype->next) {
        if (!ptype->dev || ptype->dev == skb->dev) {
                if (pt_prev) {
                        struct sk_buff *skb2=skb_clone(skb, GFP_ATOMIC);

                        if (skb2)
                                pt_prev->func(skb2,skb->dev,pt_prev);
                }
                pt_prev = ptype;
        }
}

for (ptype = ptype_base[ntohs(type)&15]; ptype!=NULL; ptype=ptype->next) {
        if (ptype->type == type && (!ptype->dev || ptype->dev==skb->dev)) {
                if (pt_prev) {
                        struct skb_buff *skb2;

                        skb2=skb_clone(skb, GFP_ATOMIC);
                        if (skb2)
                                pt_prev->func(skb2, skb->dev, pt_prev);
                }
                pt_prev = ptype;
        }
}

if (pt_prev)
        pt_prev->func(skb, skb->dev, pt_prev);
else {
        kfree_skb(skb);
}

[...]

Normally, the ptype_all list is empty, so on most Linux systems this for()
loop does not run at all (from my understanding). In my init_module()
function I am registering a new packet type (my_rcv) that handles packets
of type ETH_P_ALL such that I intercept all packets.

Note that when the first for() loop ends pt_prev will be set to my packet
type. When the second for() runs, it will eventually match ETH_P_IP.
Then, the "if (pt_prev)" test will pass, it will clone the skb, and pass
it along to my_rcv function. HOWEVER, no mechanism is provided to allow
the packet processing to stop at my_rcv. When my_rcv returns, the for()
loop will terminate, and pt_prev will now be set to ip_rcv(), and ip_rcv()
has no choice but to run.

I want my_rcv to be able to choose between free'ing the skb and stopping
the kernel processing of this packet completely or just returning
gracefully and allowing ip_rcv to take care of it.

My workaround is ugly. At the end of my_rcv, if I want packet processing
to end there, I modify the return pointer to be the address of the
kfree_skb() call so that it bypasses the call to ip_rcv(). It is simply
this assembly call that accomplishes this on my machine:

__asm__("addl $47,4(%ebp)");

I want this module to be completely self-contained (i.e., no kernel
patches) and work on all 2.2.x and above kernels. Unfortunately, the
offset I've used in the above asm call (47) is completely unportable and
just happens to work with this kernel version that was compiled with
whatever specific version of gcc I have. On another machine, this offset
was different. (BTW, I'm using SGI's kdb for my debugging, and to
discover the right offset.)

I don't mind a kludge, but I haven't been able to find a way to
dynamically calculate the offset between "call *%eax" (my_rcv) and
"call __kfree_skb" in the net_bh function. I know that the net_bh function
is not an exported symbol, so it's address is not readily available in
/proc/ksyms. The kfree_skb() function is exported but I'm not sure that
helps me at all. It would be preferable to calculate the offset once at
module init time.

Would it be possible in my_rcv to search memory starting at the return
address forward until we identify the "call __kfree_skb", and maintain
some sort of counter? How can I identify a "call __kfree_skb" based on
memory contents?

I'm kind of lost at this point, any help would be appreciated.

Aaron

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Thu Jun 15 2000 - 21:00:15 EST