Re: Routes disappear [WAS Re: Alpha: SYN-cookie problem. routes

Simon Kirby (sim@netnation.com)
Thu, 11 Mar 1999 11:43:21 -0800 (PST)


This message is in MIME format. The first part should be readable text,
while the remaining parts are likely unreadable without MIME-aware tools.
Send mail to mime@docserver.cac.washington.edu for more info.

---1362361889-131164362-921181401=:29981
Content-Type: TEXT/PLAIN; charset=US-ASCII

I believe I know why a lot of people are reporting problems like this and
why a lot of people think that the synbomb protection stuff isn't working.

(Background:)

Here at NetNation, we have a machine dedicated to routing traffic from
four physical networks. One of the connections goes down to a company
down a few floors in the building which is our Internet uplink. The
machine handles about 8 MBit/traffic sec on average from about 20 other
web servers, mail servers, dedicated servers, etc.

About 6 months ago when the traffic was a bit less, we began to notice
that routes would randomly stop working for a few seconds and then work
again...Usually after they've been idle for a while. It began to get
worse and worse, and it was really quite difficult to figure out what was
happening. Whenever I tried to get a tcpdump, it would always start
working. As time went on, it got more and more frustrating as a lot more
people were seeing the problem and other staff were reporting that they
randomly weren't able to access their mail, etc. So, I spent several days
doing all I could to try to track it down...

Eventually, I found that it might be related to ARPs not working or the
ARP table being limited by a maximum size...I checked the kernel and
didn't find anything...checked the switch manual and it was well under
the max...so I dug further and finally discovered that it might be
something to do with the route cache stuff. So, I added a whole wad of
printk()s and it turned out to be that the garbage collection stuff for
the route cache had a _minimum_ cleanup interval that was by default far
too long and the route cache table size was hitting the root, causing
ip_route_output_slow()'s call to dst_alloc() to fail. When this happens,
forwarding packets are just dropped and no message to track it down is
produced.

(Solution:)

Sure enough, this was the problem. I traced around through all the code
and found that the settings linked to some /proc files -- tweaking them
caused the problem to eventually go away completely. I had tweaked them
quite stupidly (not understanding what they did), but in a message on
January 28th, Alexey Kuznetsov sent me a message explaining that I should
decrease gc_min_interval but not touch the gc_max_size (because it's the
maximum number of records to fit in a fixed-size hash table). After doing
this, everything was fine and dandy. Of course, gc_min_interval is in
seconds, and so eventually it's going to be a problem again (it already
defaults to 5, and I set it to 1).

So, I have a question... Why the heck is there a minimum cleanup interval,
anyway? Shouldn't it just clean up as much as it needs to? I can
understand how cleanup later would make routing a bit faster, but sticking
a minimum time limit on how frequently it can clean up is kind of silly in
my opinion...

(Relating to Synbombing:)

Anyway...The reason why I think this is related to synbombing stuff is
because when people usually get synbombed, the idea is that the host
sending the bomb is supposed to spoof each SYN from a bunch of random IPs
in hopes that most of them will not reply with an RST and the host will be
"stuck" with a bunch of IPs that don't say "no, I didn't ask for that, go
away". So, if more than /proc/sys/net/ipv4/route/max_size fake IPs are
attempted to be replied to with a syn cookie (or another SYN or whatever)
and the garbage cleanup isn't going to remove them in time, it's going to
hit the roof and cause new routes not to be allocated again.

So, I tested my theory. I pulled out an old syn bombing program I wrote
to test on the IRC server I was writing a looooong time ago, and bombed my
IRC server. Sure enough, afterwards I was not able to ping another
machine on the network:

...ermm...Hrm, I just ran it again and realized that I can lock up my
machine hard by synbombing myself (2.2.3ac1). Odd. I will look at this
further in a bit...perhaps it has something to do with my printk patch
(below). Anyway, try #2:

...ermm....okay, this time it rebooted itself. Very interesting. Lets
try with a few less packets...

...arf, dead again, and this time it overwrote my first superblock with
crap. There's something very bad happening.

Anyway, what happened is that I got "ENOBUFS" from ping for a second or so
and then it tried again and had enough room in the route cache table to
add the route from ip_route_output_slow(). So, I'm guessing this is what
was happening.

As a test, perhaps try doing this to see if it makes any difference:

echo 0 > /proc/sys/net/ipv4/route/gc_min_interval

...This should allow it to clean up as much as it needs to.

Also, you may want to try applying the attached patch to see if it's
really hitting this limit. I recommend that something like it should
actually go into the kernel, because the problem really bit me hard on the
routing machine and it took forever for me to track it down. Of course,
something like the attached patch is probably not very safe because it
will spam you like crazy if more and more packets without cached routes
are trying to go through...It'd need something to throttle the message.

Let me know if it makes any difference.

Simon-

| Simon Kirby | Systems Administration |
| mailto:sim@netnation.com | NetNation Communications |
| http://www.netnation.com/ | Tech: (604) 684-6892 |

---
On Thu, 11 Mar 1999, Michael Hasenstein wrote:

> forwarded messge below. > Thought it was interesting enough. > Remark: after the attack there are lines like the following in messages: > Mar 8 00:10:55 tantalus sshd[242]: error: accept: No route to host > > looks like the routes have vanished, but the interface is still active > > attached is the kernel configuration, just in case > > one additional output from messages not included in email below (111.111.111.111 and 'tantalus' inserted by me, don't know if the guy sending the email wants his hosts IP/name in the public). > > Mar 8 00:09:04 tantalus kernel: Warning: possible SYN flood from 195.186.5.114 on 111.111.111.111:80. Sending cookie > s. > Mar 8 00:10:05 tantalus kernel: Warning: possible SYN flood from 212.40.5.72 on 111.111.111.111:80. Sending cookies. > Mar 8 00:10:55 tantalus sshd[242]: error: accept: No route to host > Mar 8 00:11:40 tantalus sshd[242]: error: accept: No route to host > Mar 8 00:11:47 tantalus kernel: Warning: possible SYN flood from 194.148.13.250 on 111.111.111.111:80. Sending cooki > es. > Mar 8 00:12:25 tantalus sshd[242]: error: accept: No route to host > Mar 8 00:13:13 tantalus sshd[242]: error: accept: No route to host > ---------------------------------------- > > Hi, > > I am having strange problems with my Alphastation 500, now running Kernel > 2.2.3 (the problem turned up with kernel 2.0.35 and 2.2.2ac7). That when the > maschine gets attacked, the network interface (eth0) stops working. What > happens : I can ping my own eth0 from the console, but pinging the network > times out. Likewise an other host on the internet cannot ping my Server. > > The fact that this accured with 2.2.2ac7 and 2.0.35 leads me to belive it > may not be kerrnel problem. But what i found strange was the server was up > the whole day. syn_cookie support was in the kernel, but not enabled. at > about 22:00 CET i enabled that with "echo 1 > > /proc/sys/net/ipv4/tcp_syncookies" and while later (about 20 min - 30 min) > the connection died on me ( i was logged in with telnet) .. and after that > there was no more answer till i reset the server this morning. > > The Server is Running : > > Kernel 2.2.3 (was 2.2.2ac7 yesterday) > Apache 1.3.4 & PHP 3.07 & Ben-SSL 1.3.1 > Sendmail 8.9.3 > > I compiled thease programs with egcs 1.1.1 > > /var/log/messages shows : > > Mar 10 22:54:38 tantalus kernel: possible SYN flooding on port 80. Sending > cookies. > Mar 10 22:55:39 tantalus kernel: possible SYN flooding on port 80. Sending > cookies. > Mar 10 22:56:39 tantalus kernel: possible SYN flooding on port 80. Sending > cookies. > Mar 10 22:57:39 tantalus kernel: possible SYN flooding on port 80. Sending > cookies. > <snip> > Mar 11 02:13:13 tantalus kernel: possible SYN flooding on port 80. Sending > cookies. > > One other wierd thing : named was not running .. and there were no hourly > stats in the log file .. > hmm wonder if thats got anything to do with it... > > no other strange log messages .. > > Update : i just had an other telnet session open ( this time 2.2.3) and the > network I/F died again :(( > I booted the alpha about 1,5 hours ago .. > and again . in /var/log/messages > > Mar 11 11:45:13 tantalus kernel: possible SYN flooding on port 80. Sending > cookies > <snip> > Mar 11 11:55:08 tantalus kernel: possible SYN flooding on port 80. Sending > cookies. > > > > I will update BIND, inetd , and net-tools , but if that doesn't help .. i am clueless > > and disable syn_cookie support . .(could it be buggy ?) > > and boy are my customers unhappy > > > Regards > (someone) > == > -- > Michael Hasenstein > http://www.csn.tu-chemnitz.de/~mha/ > Private Pilot (ASEL) since 1998 > _________________________________________________________ > DO YOU YAHOO!? > Get your free @yahoo.com address at http://mail.yahoo.com >

---1362361889-131164362-921181401=:29981 Content-Type: TEXT/PLAIN; charset=US-ASCII; name="linux-2.2.0+dst_alloc_gc_warning.patch" Content-Transfer-Encoding: BASE64 Content-ID: <Pine.LNX.4.10.9903111143210.29981@peace.netnation.com> Content-Description: Content-Disposition: attachment; filename="linux-2.2.0+dst_alloc_gc_warning.patch"

LS0tIGxpbnV4L25ldC9jb3JlL2RzdC5jLm9yaWcJVHVlIEphbiAyNiAwOTox Mzo1NSAxOTk5DQorKysgbGludXgvbmV0L2NvcmUvZHN0LmMJVHVlIEphbiAy NiAwOToxNzoyNiAxOTk5DQpAQCAtODEsOCArODEsMTAgQEANCiAJc3RydWN0 IGRzdF9lbnRyeSAqIGRzdDsNCiANCiAJaWYgKG9wcy0+Z2MgJiYgYXRvbWlj X3JlYWQoJm9wcy0+ZW50cmllcykgPiBvcHMtPmdjX3RocmVzaCkgew0KLQkJ aWYgKG9wcy0+Z2MoKSkNCisJCWlmIChvcHMtPmdjKCkpew0KKwkJCXByaW50 ayhLRVJOX0NSSVQgImRzdF9hbGxvYzogZ2NfdGhyZXNoIGV4Y2VlZGVkIVxu Iik7DQogCQkJcmV0dXJuIE5VTEw7DQorCQl9DQogCX0NCiAJZHN0ID0ga21h bGxvYyhzaXplLCBHRlBfQVRPTUlDKTsNCiAJaWYgKCFkc3QpDQo= ---1362361889-131164362-921181401=:29981--

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/