Mysterious crashes (IP masq related?)

Bjarni R. Einarsson (bre@margmidlun.is)
Wed, 30 Dec 1998 14:02:58 +0000


Hi all,

I'm experiencing some rather strange crashes/lockups, on one of my main IP
masquerading boxes (I use IP masq extensively as an integral part of a
'paranoid' dial-in ISP service).

The problem has not completely been tracked down, but it appears that the
most likely explanation is related to IP masquerading (the machine doesn't
do much else), and repeated attempted connections from a user to an
unreachable host.

Interestingly enough, about a minute before the crash 'ipfwadm -M -ln' gave
the following output (unrelated entries removed):

udp 04:16.16 A.B.C.D 157.157.x.x 62468 (62469) -> 137
udp 04:16.11 A.B.C.D 157.157.x.x 62467 (62468) -> 137
udp 04:16.24 A.B.C.D 157.157.x.x 62470 (62471) -> 137
udp 04:16.22 A.B.C.D 157.157.x.x 62469 (62470) -> 137
udp 04:15.82 A.B.C.D 157.157.x.x 62464 (62465) -> 137
udp 04:15.80 A.B.C.D 157.157.x.x 62463 (62464) -> 137
udp 04:15.96 A.B.C.D 157.157.x.x 62466 (62467) -> 137
udp 04:15.87 A.B.C.D 157.157.x.x 62465 (62466) -> 137
udp 04:16.45 A.B.C.D 157.157.x.x 62475 (62476) -> 137
udp 04:16.30 A.B.C.D 157.157.x.x 62472 (62473) -> 137
udp 04:16.27 A.B.C.D 157.157.x.x 62471 (62472) -> 137
udp 04:16.37 A.B.C.D 157.157.x.x 62474 (62475) -> 137
udp 04:16.34 A.B.C.D 157.157.x.x 62473 (62474) -> 137
udp 04:17.89 A.B.C.D 157.157.x.x 62452 (62453) -> 137
udp 04:17.84 A.B.C.D 157.157.x.x 62451 (62452) -> 137
udp 04:18.01 A.B.C.D 157.157.x.x 62454 (62455) -> 137
udp 04:17.93 A.B.C.D 157.157.x.x 62453 (62454) -> 137
udp 04:17.71 A.B.C.D 157.157.x.x 62448 (62449) -> 137 9 !
udp 04:17.54 A.B.C.D 157.157.x.x 62447 (62448) -> 137 8 !
udp 04:17.82 A.B.C.D 157.157.x.x 62450 (62451) -> 137
udp 04:17.78 A.B.C.D 157.157.x.x 62449 (62450) -> 137
udp 04:15.77 A.B.C.D 157.157.x.x 62462 (62463) -> 137
udp 04:15.76 10.x.x.x 157.157.x.x 60373 (62462) -> 137
udp 04:17.35 A.B.C.D 157.157.x.x 62444 (62445) -> 137 5
udp 04:17.30 A.B.C.D 157.157.x.x 62443 (62444) -> 137 4
udp 04:17.44 A.B.C.D 157.157.x.x 62446 (62447) -> 137
udp 04:17.39 A.B.C.D 157.157.x.x 62445 (62446) -> 137
udp 04:17.06 10.x.x.x 157.157.x.x 60371 (62441) -> 137 1
udp 04:17.27 A.B.C.D 157.157.x.x 62442 (62443) -> 137 3
udp 04:17.22 A.B.C.D 157.157.x.x 62441 (62442) -> 137 2
udp 04:16.30 A.B.C.D 157.157.x.x 62472 (62473) -> 137
udp 04:16.27 A.B.C.D 157.157.x.x 62471 (62472) -> 137
udp 04:16.37 A.B.C.D 157.157.x.x 62474 (62475) -> 137
udp 04:16.34 A.B.C.D 157.157.x.x 62473 (62474) -> 137
udp 04:17.89 A.B.C.D 157.157.x.x 62452 (62453) -> 137
udp 04:17.84 A.B.C.D 157.157.x.x 62451 (62452) -> 137
udp 04:18.01 A.B.C.D 157.157.x.x 62454 (62455) -> 137
udp 04:17.93 A.B.C.D 157.157.x.x 62453 (62454) -> 137
udp 04:17.71 A.B.C.D 157.157.x.x 62448 (62449) -> 137 9 !
udp 04:17.54 A.B.C.D 157.157.x.x 62447 (62448) -> 137 8 !
udp 04:17.82 A.B.C.D 157.157.x.x 62450 (62451) -> 137
udp 04:17.78 A.B.C.D 157.157.x.x 62449 (62450) -> 137
udp 04:15.77 A.B.C.D 157.157.x.x 62462 (62463) -> 137
udp 04:15.76 10.x.x.x 157.157.x.x 60373 (62462) -> 137
udp 04:17.35 A.B.C.D 157.157.x.x 62444 (62445) -> 137
udp 04:17.30 A.B.C.D 157.157.x.x 62443 (62444) -> 137
udp 04:17.44 A.B.C.D 157.157.x.x 62446 (62447) -> 137 7
udp 04:17.39 A.B.C.D 157.157.x.x 62445 (62446) -> 137 6
udp 04:17.06 10.x.x.x 157.157.x.x 60371 (62441) -> 137
udp 04:17.27 A.B.C.D 157.157.x.x 62442 (62443) -> 137
udp 04:17.22 A.B.C.D 157.157.x.x 62441 (62442) -> 137

Here A.B.C.D is one of two IP addresses on the Masq box (the internal one),
10.x.x.x is a client with a dial-up connection, and 157.157.x.x is an
unreachable host. (Telnet 157.157.x.x replies with 'Network is
unreachable').

This looks to me like some sort of infinite loop has started (I've appended
sequencial numbers to some of the entries above - some even appear to be
duplicates - marked a couple with '!'). This looks *really* spooky to me.

Theory: Eventually the machine stops responding to the rest of the world,
and spends all its time looping these bogus masq. connections.

Is this possible? If so, have I discovered a kernel bug? :-)

This problem is relatively new - until a couple of weeks ago, the box had
2-3 months of continuous uptime. Moving the traffic to another box running
the same configuration caused it to crash as well (so this is a software or
configuration problem, not hardware).

I've installed kernel firewalling rules to block the offending connections,
and will continue monitoring the machine. If it doesn't crash again within
a few days, then I'll assume the above is the cause of my headaches - if it
turns out this was all a false alarm, I'll repost to let y'all know.

If any kernel gurus want to take a more detailed look at my configuration,
I'll be happy to send more detailed information - I just didn't want to
announce all my firewalling rules and hidden IP networks on an open mailing
list.

Thanks in advance, and happy new years!

-- 
Bjarni R. Einarsson                     [ PGP: 02764305 / B7A3AB89 ]
 bre@margmidlun.is -=- http://www.mmedia.is/~bre/ -=- Juggler@IRCnet

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/