Re: Consistent kernel hang during heavy TCP connection handling load

From: Bill Davidsen
Date: Thu Sep 30 2004 - 11:36:19 EST


Andrew A. wrote:
Jan,

Thanks for responding. When I got no responses, I searched for ways to get more data out of the kernel--I must say that it has been
quite a journey to identify what is working, where to get it, and how to install it when it comes to kernel
debugging/crash-data-gathering tools. LKCD for example, is not available at the location you'll eventually arrive at if you search
for it in google ... it's not obvious what it's state is (current/defunct/superceded), there's KDB, KGDB, netdump, netconsol,
netlog, diskdump (conusingly known as lkdump) etc. etc. And then, even if you do figure out what tools are current, you then have
to match the tool to the particular kernel version you are running -- which can be a task and a half unto itself.

Is diskdump available for 2.4? Can anyone comment on the choice of tools below?

Anyway, I have also done all of the following:

(1) Enabled netdump/netconsole on 2.6.8.1-521 Fedora Core kernel, after first fixing the startup scripts. Fixes can be found at
www.memeplex.com/Linux.html Note that after I also fixed crash.c to be a 2.6 compliant kernel module, and loading it to test
netdump, I always end up with a vmcore-incomplete image approx 45k in size, on the netdump-server. Can anyone tell me if this is
absurdly small, and if so, what might be the solution? The client box always reboots so I suspect too-small timeouts are the issue.

My experience is 100% with RH kernels, but the dump should be about memory size, in my case 2.5G or 4G and it is. But I did see hangs which resulted in the size you mention, a few k and hang.

There was a patch floating around to write a core image to a disk partition like Solaris, AIX, and other commercial systems, but Linus was opposed for some reason I remember as "I don't need this and it culd be dangerous" or similar. If that can be retrofitted to a current kernel it would be more useful than netdump, I suspect.

In any case, the short answer is that what you see is way too short, it sounds like the header info on config, registers, or somesuch that netdump sends first before the core.

--
-bill davidsen (davidsen@xxxxxxx)
"The secret to procrastination is to put things off until the
last possible moment - but no longer" -me
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/