Re: [PATCH v2 20/23] netoops: Add x86 specific bits to packet headers

From: Mike Waychison
Date: Tue Nov 09 2010 - 12:56:33 EST


On Tue, Nov 9, 2010 at 6:22 AM, Neil Horman <nhorman@xxxxxxxxxxxxx> wrote:
> On Mon, Nov 08, 2010 at 12:33:35PM -0800, Mike Waychison wrote:
>> We need to be able to gather information about the CPUs that caused the crash.
>>
>> This commit only handles x86, but it is desirable to come up with some new
>> packet format that can accommodate any architecture.
>>
>> Signed-off-by: Mike Waychison <mikew@xxxxxxxxxx>
>> ---
>> TODO: This should be made more general to other architectures.  As is, we are
>> probably okay exporting some value for the 'arch' field.  Different
>> architectures though will likely want to gather different data.
>> ---
>>  drivers/net/netoops.c |   27 +++++++++++++++++++++------
>>  1 files changed, 21 insertions(+), 6 deletions(-)
>>
> Not sure I see the value in encapsulating arch specific data in a netoops
> message.  Ostensibly this information can be inferred at the time of the crash
> by the name/ip of the system crashing (one presumes that the sysadmin knows what
> systems are what arch, or can look it up easily).

This actually becomes harder than it appears at first. The
distributed nature of our systems means that we cannot ever rely on a
central data source that describes the machines we have without having
to worry about network partitions and service downtimes. The
alternative is to post-process crashes, looking up machine information
in various data sources and hoping that the results are consistent.
This becomes yet another job in the cluster, which seems a little
silly when we could just have the machine self describe itself at the
time of the crash.

>
> If thats not the case, why not just dump out the contents of /proc/cpuinfo in
> ascii form, so that no arch specific data is needed?

As a segment of the dump? I'm okay with doing this, as long it never
makes it's way into log_buf. log_buf is a real pain to parse given
the lack of transactions and the fact that many other cores may be
scribbling all over it.

A couple years ago, we speced out a different wire protocol for these
packets, version 3 (yes, this has already had a version bump).
Anyhow, we came up with a design that used (key,length)->value fields.
Keys were designed to be 16bit wide integers and clients could
easily ignore fields that it doesn't understand. We never implemented
this, but it'd be great if folks bought into it. It'd allow us to
ship things like file contents side by side with other structured
fields like pt_regs snapshots, the log_buf and a user defined buffer.

How do folks feel about something like that?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/