// ksoftirqd at 99%. caused by certain network activity? //

From: Julian Oliver
Date: Mon Jan 23 2006 - 11:54:41 EST



hi lists,

what follows is an unusual account of something that has now happened to
my system on four occassions. after poking around all over the place, i
can't find any concrete cause, and can only find a few others that have
had similar experiences with no clear fix.

yes, i am using ipw2200-1.0.8. even though i'm not using a driver shipped by
the kernel, i notice people in the below threads on linux-kernel had similar
problems with cards that are shipped with the kernel, notably a realtek.
there is no ipw2200 list, so i cc'd to the ipw2200 devel list as they
also look after this driver there.

it may be innapropriate for me to post to linux-net, but as i'm fairly certain it's network
activity that's bringing this condition on, i thought i'd give it a shot.

if you can tell me that this condition is fixed in kernels younger than 2.6.14
i'll upgrade right away. i didn't recognise anything in the changelog that might remedy this.
it may be a driver level thing, but as some realtek users have also
experienced it, i write here.

i would upgrade the kernel anyway, but as i'm very happy with my current kernel in all other respects
(including third-party drivers) i'd rather leave it as is until some jaw-dropping new feature
encourages me otherwise.

anyway, this is what happens:

//---------------------------------------------------------------->

2:45am my fan starts screaming and my bandwidth drops back to nothing..

netstat -tupa shows no active connections but top is running hot with ksoftirqd.

//---------------------------------------------------------------->

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2 root 39 19 0 0 0 R 94.2 0.0 5:17.74 ksoftirqd/0
4489 debian-t 16 0 9428 7740 1764 S 3.3 1.5 0:07.57 tor
15691 delire 15 0 70856 21m 6920 S 1.7 4.2 4:04.07 skype
4927 root 5 -10 61836 45m 2576 S 0.3 9.1 7:09.90 Xorg
8711 delire 15 0 137m 59m 16m S 0.3 11.7 0:42.14 firefox-bin
1361 root 16 0 2252 1124 840 R 0.3 0.2 0:00.03 top

[...]

//<----------------------------------------------------------------

tailing syslog i find my firewall is working hard - a few TOR routers in there:

dropworld:/home/delire# tail -f /var/log/syslog
Jan 13 02:52:36 localhost kernel: Shorewall:fw2net:ACCEPT:IN= OUT=eth1 SRC=192.168.2.34 DST=195.169.149.45 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=45235 DF PROTO=TCP SPT=44106 DPT=9001 WINDOW=5840 RES=0x00 SYN URGP=0
Jan 13 02:52:37 localhost kernel: Shorewall:fw2net:ACCEPT:IN= OUT=eth1 SRC=192.168.2.34 DST=18.244.0.188 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=7034 DF PROTO=TCP SPT=59169 DPT=9001 WINDOW=5840 RES=0x00 SYN URGP=0
Jan 13 02:52:37 localhost kernel: Shorewall:fw2net:ACCEPT:IN= OUT=eth1 SRC=192.168.2.34 DST=192.168.2.1 LEN=62 TOS=0x00 PREC=0x00 TTL=64 ID=49284 DF PROTO=UDP SPT=32982 DPT=53 LEN=42
Jan 13 02:52:39 localhost kernel: Shorewall:fw2net:ACCEPT:IN= OUT=eth1 SRC=192.168.2.34 DST=195.169.149.45 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=45237 DF PROTO=TCP SPT=44106 DPT=9001 WINDOW=5840 RES=0x00 SYN URGP=0
Jan 13 02:52:39 localhost kernel: Shorewall:fw2net:ACCEPT:IN= OUT=eth1 SRC=192.168.2.34 DST=128.112.154.99 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=59936 DF PROTO=TCP SPT=54120 DPT=443 WINDOW=5840 RES=0x00 SYN URGP=0
Jan 13 02:52:40 localhost kernel: Shorewall:fw2net:ACCEPT:IN= OUT=eth1 SRC=192.168.2.34 DST=217.160.135.169 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=12546 DF PROTO=TCP SPT=41396 DPT=9001 WINDOW=5840 RES=0x00 SYN URGP=0
Jan 13 02:52:42 localhost kernel: Shorewall:fw2net:ACCEPT:IN= OUT=eth1 SRC=192.168.2.34 DST=128.112.154.99 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=59938 DF PROTO=TCP SPT=54120 DPT=443 WINDOW=5840 RES=0x00 SYN URGP=0
Jan 13 02:52:42 localhost kernel: Shorewall:fw2net:ACCEPT:IN= OUT=eth1 SRC=192.168.2.34 DST=192.168.2.1 LEN=62 TOS=0x00 PREC=0x00 TTL=64 ID=49285 DF PROTO=UDP SPT=32982 DPT=53 LEN=42


//---------------------------------------------------------------->

taking down eth1 resulted in ksoftirqd immediately dropping to a sub-percent CPU load and my screaming fan ceasing to scream.
everything returns to normal.

taking eth1 up again 10 minutes later, reproduces the problem, even after all common network-using services (browsers etc) are killed and i
cannot pump for an IP as my box is being hit so hard.

at this stage, ** i'm not even on the local network **. ifconfig/ping/tcpdump confirm this.

rebooting the system, and bringing up the interface, i find the machine is in a normal state.

later, and with TOR disabled, exactly the same thing happens while talking in skype, downloading with ftp and
uploading a file simultaneously.

what's messing with my soft interrupts? ACPI? USB? or does network traffic send the kernel into a spin?

i found a couple of other people with completely different architecture had the same problem with no real solution other than kernel developers
looking into it. it does seem related to network activity:

http://readlist.com/lists/vger.kernel.org/linux-kernel/5/29182.html

as it's a long thread, here's a summary:

http://readlist.com/lists/vger.kernel.org/linux-kernel/6/30718.html

they put there problem down to heavy load on a network interface. regardless it would seem though that you can bring this linux system to
it's knees by putting a network interface under heavy load (eg in a DoS fashion).

of note is that this is the first time this has happened to my system in 1.5 years of daily driving.

i'm using ipw2200-1.0.8. even though i'm not using a driver shipped by the kernel, i notice people in the above threads
had similar problems with cards that are shipped with the kernel, notably a realtek.



//---------------------------------------------------------------->

$ lspci

0000:00:00.0 Host bridge: Intel Corporation 82855PM Processor to I/O Controller (rev 21)
0000:00:01.0 PCI bridge: Intel Corporation 82855PM Processor to AGP Controller (rev 21)
0000:00:1d.0 USB Controller: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) USB UHCI Controller #1 (rev 03)
0000:00:1d.1 USB Controller: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) USB UHCI Controller #2 (rev 03)
0000:00:1d.2 USB Controller: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) USB UHCI Controller #3 (rev 03)
0000:00:1d.7 USB Controller: Intel Corporation 82801DB/DBM (ICH4/ICH4-M) USB2 EHCI Controller (rev 03)
0000:00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev 83)
0000:00:1f.0 ISA bridge: Intel Corporation 82801DBM (ICH4-M) LPC Interface Bridge (rev 03)
0000:00:1f.1 IDE interface: Intel Corporation 82801DBM (ICH4-M) IDE Controller (rev 03)
0000:00:1f.3 SMBus: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) SMBus Controller (rev 03)
0000:00:1f.5 Multimedia audio controller: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) AC'97 Audio Controller (rev 03)
0000:00:1f.6 Modem: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) AC'97 Modem Controller (rev 03)
0000:01:00.0 VGA compatible controller: ATI Technologies Inc RV350 [Mobility Radeon 9600 M10]
0000:02:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5788 Gigabit Ethernet (rev 03)
0000:02:01.0 CardBus bridge: Ricoh Co Ltd RL5c476 II (rev ac)
0000:02:01.1 CardBus bridge: Ricoh Co Ltd RL5c476 II (rev ac)
0000:02:01.2 FireWire (IEEE 1394): Ricoh Co Ltd R5C552 IEEE 1394 Controller (rev 04)
0000:02:02.0 Network controller: Intel Corporation PRO/Wireless 2200BG (rev 05)

//<----------------------------------------------------------------

$ uname -a

Linux ***** 2.6.14 #7 PREEMPT Fri Dec 9 03:39:41 CET 2005 i686 GNU/Linux

//---------------------------------------------------------------->


--
_ _ _
___ ___| |___ __| |_ _ __ __ _ _ _| |__ ___
(_-</ -_) / -_) _| _| '_ \/ _` | '_| / /(_-<
/__/\___|_\___\__|\__| .__/\__,_|_| |_\_\/__/
|_http://selectparks.net/~julian

-
To unsubscribe from this list: send the line "unsubscribe linux-net" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html