linux-2.0.36pre10: memory-leak in bridging code?

Andreas Haumer (andreas@xss.co.at)
Mon, 21 Sep 1998 00:24:12 +0200


Hi Alan, hi all!

I think there is some memory-leak problem in linux-2.0.36, and it
looks like it's in the bridging code.

Here's the situation:

I installed a linux system as packet filtering router and remote
access server for a customer of mine 14 days ago. As I had
very good experience with stress testing the linux-2.0.36 pre-
releases, I decided to install 2.0.36pre6.

The system has two ethernet cards (tulip driver), 32MB RAM,
AMD K6-200 CPU, 4 serial devices for incoming PPP connections
and a IDE HDD.

I've set up about 30 filtering rules, I use IP aliasing for
both interfaces (to provide IP address switch-over to a standby
system, just like HP-UX MC/SG... :-) and I also compiled
bridgeing code in, though I didn't configure the bridge.

In the next 10 days I noticed some mysterious reboots almost
every 3 days or so. I did some checks and noticed, that over
the time the system used more and more memory, up to a point
where it didn't response to network packets or serial login.
As I have the software watchdog installed, the system eventually
rebooted, and the whole game started again.

I then set up a similar system here at my office (with 2.0.36pre10),
and I could reproduce this behaviour.
I let the system under test run for several days and collected
lots of data using "vmstat".
"vmstat" showed a constant decrease in free+buffer+cache
memory, up to the point where almost everything was swapped
out, the system started to swap like crazy and stopped responding.

I then applied Ingo Molnar's memleak-deluxe patches and let
it run for about 7 hours. As a result it shows a significant
amount of allocations at br.c:889 (a number of more than 43000,
where everything else is under 1000!).

I recompiled the kernel without the bridge, and the problem went away!

The referenced line in br.c is in function "send_config_bpdu":

[...]
int send_config_bpdu(int port_no, Config_bpdu *config_bpdu)
{
struct sk_buff *skb;
struct device *dev = port_info[port_no].dev;
int size;
unsigned long flags;

if (port_info[port_no].state == Disabled) {
printk(KERN_DEBUG "send_config_bpdu: port %i not valid\n",port_no);
return(-1);
}
if (br_stats.flags & BR_DEBUG)
printk("send_config_bpdu: ");
/*
* create and send the message
*/
size = sizeof(Config_bpdu) + dev->hard_header_len;
skb = alloc_skb(size, GFP_ATOMIC);
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ line 889
if (skb == NULL) {
printk(KERN_DEBUG "send_config_bpdu: no skb available\n");
return(-1);
}
skb->dev = dev;
skb->free = 1;
[...]

I don't know too much about this piece of code, but it looks like
it consumes a lot of memory, which never get's freed! Maybe it's
because I did compile but didn't configure the bridge with brcfg?
Where should those skb's allocated in that function get freed again,
anyway?

Any ideas? I hope, this reports helps finding the problem. If
someone wants more data, I have several hundert k's of vmstat
logging information, and even some nice graphical statistics
about the memory consumption over time.

- andreas

-- 
 Andreas Haumer         | email: andreas@xss.co.at | PGP key available
 *x Software + Systeme  | phone: +43.1.6001508     | on request.
 Buchengasse 67/8       |        +43.664.3004449   |   
 A-1100 Vienna, Austria |   fax: +43.1.6001507     | AH327-RIPE

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/