Re: KDB in the mainstream 2.4.x kernels?

From: linas@austin.ibm.com
Date: Fri Jul 18 2003 - 19:31:08 EST


Hi Andi,

I'm happy to get a response...

On Fri, Jul 18, 2003 at 10:43:57PM +0200, Andi Kleen wrote:
>
> One argument i have against it: KDB is incredibly ugly code.
> Before it could be even considered for merging it would need quite a lot
> of cleanup.

What in particular? I just looked at kdb/kdbmain.c and kdb/kdb_bt.c
and it looks fine to me; fairly minimal even. I don't know about
arch-specific code. Is there a particular file you're complaining about?

(very very off-topic: I love reading about the neat things that
reiser v4 will do, but wow, every time I read reiserfs code, 'ugly'
is indeed the word that flies to mind).

> I actually started on porting the KDB backtracer recently to get
> reliable frame pointer based backtraces, but it turns out the code
> for that is so complicated and ugly that the chances of ever merging
> it would be very slim.
?

I have not (yet?) studied the code in detail, so point me at something
ugly; I'm not sure what you are talking about. Now, stack traces are
in general ugly because registers and args are splattered all over
the place, and the struct pt_regs are even worse. So there's some
inherent ugliness there ...

Since I live in KDB, I might have some spare time to cleanup/fix,etc.
so nows a good time to talk ..

> As for your home crash debugging I suspect you would be better of with LKCD
> or Mcore (crash dump, saving an crash image on some partition, with examining
> the crash dump after reboot)

I'll look ... given that I own lots flaky IDE hardware, though, I'm catious.
I get 'DriveReady SeekComplete Error' messages daily ... I learned the hard
way that these aren't necessarily the fault of the hard drive, and I have
suffered through corrupted fs's as a result ...

Generically, for servers, if you can just save the dump, reboot, and
let the server go on, and analyze the dump at leisure, that is the
prefered way to do things. Especially if you are doing customer support.
(Linux is at the dawn of the era of having customers who have actually
spent in excess of $100K or $1M on a server and who will be going
apoplectic when it crashes. This will put a spotlight on dump tools).

> KDB is usually not useful for debugging hangs on desktop boxes (and even
> many servers) because you have usually X running. When the machine crashes and
> goes in KDB you cannot see the text output and debug anything. I learned

I'm willing to put console on serial port. I've got enough machines
& serial cables, this doesn't bother me.

> Disadvantage is that the current crash dump mechanisms (lkcd, mcore
> crash, netdump) are all still not very reliable and have horrible
> error handling.

This statement makes me nervous. One of the worst feelings one can get
when debugging is not being able to trust the data you are looking at.
Its too easy to loose a lot of time (and credibility) making incorrect
hypothesis based on bad data.

Dedicating a partition that is unformated, and whose sole purpose
in life is to record a dump -- that is a viable option, at least on
servers, where high uptime is more important, and storage is cheap.

On my home machines, its sort of the other way around: I don't trust
IDE, I don't have the disk space.

But you convinced me; I need more time on lkcd, etc.

--linas

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Wed Jul 23 2003 - 22:00:36 EST