I think there is some memory-leak problem in linux-2.0.36, and it
looks like it's in the bridging code.
Here's the situation:
I installed a linux system as packet filtering router and remote
access server for a customer of mine 14 days ago. As I had
very good experience with stress testing the linux-2.0.36 pre-
releases, I decided to install 2.0.36pre6.
The system has two ethernet cards (tulip driver), 32MB RAM,
AMD K6-200 CPU, 4 serial devices for incoming PPP connections
and a IDE HDD.
I've set up about 30 filtering rules, I use IP aliasing for
both interfaces (to provide IP address switch-over to a standby
system, just like HP-UX MC/SG... :-) and I also compiled
bridgeing code in, though I didn't configure the bridge.
In the next 10 days I noticed some mysterious reboots almost
every 3 days or so. I did some checks and noticed, that over
the time the system used more and more memory, up to a point
where it didn't response to network packets or serial login.
As I have the software watchdog installed, the system eventually
rebooted, and the whole game started again.
I then set up a similar system here at my office (with 2.0.36pre10),
and I could reproduce this behaviour.
I let the system under test run for several days and collected
lots of data using "vmstat".
"vmstat" showed a constant decrease in free+buffer+cache
memory, up to the point where almost everything was swapped
out, the system started to swap like crazy and stopped responding.
I then applied Ingo Molnar's memleak-deluxe patches and let
it run for about 7 hours. As a result it shows a significant
amount of allocations at br.c:889 (a number of more than 43000,
where everything else is under 1000!).
I recompiled the kernel without the bridge, and the problem went away!
The referenced line in br.c is in function "send_config_bpdu":
[...]
int send_config_bpdu(int port_no, Config_bpdu *config_bpdu)
{
struct sk_buff *skb;
struct device *dev = port_info[port_no].dev;
int size;
unsigned long flags;
if (port_info[port_no].state == Disabled) {
printk(KERN_DEBUG "send_config_bpdu: port %i not valid\n",port_no);
return(-1);
}
if (br_stats.flags & BR_DEBUG)
printk("send_config_bpdu: ");
/*
* create and send the message
*/
size = sizeof(Config_bpdu) + dev->hard_header_len;
skb = alloc_skb(size, GFP_ATOMIC);
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ line 889
if (skb == NULL) {
printk(KERN_DEBUG "send_config_bpdu: no skb available\n");
return(-1);
}
skb->dev = dev;
skb->free = 1;
[...]
I don't know too much about this piece of code, but it looks like
it consumes a lot of memory, which never get's freed! Maybe it's
because I did compile but didn't configure the bridge with brcfg?
Where should those skb's allocated in that function get freed again,
anyway?
Any ideas? I hope, this reports helps finding the problem. If
someone wants more data, I have several hundert k's of vmstat
logging information, and even some nice graphical statistics
about the memory consumption over time.
- andreas
-- Andreas Haumer | email: andreas@xss.co.at | PGP key available *x Software + Systeme | phone: +43.1.6001508 | on request. Buchengasse 67/8 | +43.664.3004449 | A-1100 Vienna, Austria | fax: +43.1.6001507 | AH327-RIPE- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/