Re: [PATCH] x86_64: Limit the number of processor bootup messages

From: Mike Travis
Date: Mon Nov 02 2009 - 14:22:03 EST




Andi Kleen wrote:
Mike Travis wrote:

This set of patches limits the number of repetitious messages which contain
no additional information. Much of this information is obtainable from the
/proc and /sysfs. Most of the messages are also sent to the kernel log
buffer as KERN_DEBUG messages so it can be used to examine more closely any
details specific to a processor.

What would be good is to put the information from the booting CPUs
into some buffer and print it visibly if there's a timeout detected on the BP.

What do you think of this idea.... Add a "mark kernel log buffer" function,
and then if any KERN_NOTE or above happens, it sends the marked info from
the kernel log buffer to the console before the current message. Set the
marker to '0' to clear.

And I was thinking that you might want to print the history of the previous
cpu that booted ok, before printing the info for the cpu that didn't. That
way you'd have some data to compare it with?


Also power of two summaries at a bit odd, but ok.

For Processor Information printout:

[ 90.968381] Summary Processor Information for CPUS: 0-639
[ 90.972033] Genuine Intel(R) CPU 0000 @ 2.13GHz stepping 04

It would be good to print family/model in this line

There is more info that should be printed? I'm just calling the current
print_cpu_info using the cpuinfo_x86 for the first cpu in the list. And
it appears that it is printing the x86_model_id. Is there some other info
in that struct that should be printed?


[ 90.981402] CPU: L1 I cache: 32K, L1 D cache: 32K
[ 90.985888] CPU: L2 cache: 256K
[ 90.988032] CPU: L3 cache: 24576K

I would recommend to drop the cache information; this can be easily
gotten at runtime and is often implied in the CPU name anyways
(and especially L1 and increasingly L2 too change only very rarely)

Ok, though because of future system upgrades to a UV system, you can
end up with slightly different processors (of the same family). The
only differences I've detected so far in testing is the stepping has
changed.


[ 90.992032] MIN 4266.68 BogoMIPS (lpj=8533371)
[ 91.000033] MAX 4267.89 BogoMIPS (lpj=8535789)

Perhaps an average too? You could put all that on one line.

Sure thing.


These lines have been moved to loglevel KERN_DEBUG:

CPU: Physical Processor ID:
CPU: Processor Core ID:
CPU %d/0x%x -> Node %d
<cache line sizes per cpu>

I think you can just remove them.

I left them in in case we get to the point of printing KERN_DEBUG
messages in case of a failure. But you think they will not be
necessary in that case? (I also left them KERN_DEBUG instead of
pr_debug as the latter optimizes out the print if kernel DEBUG
is not defined... which it won't be in 99% of the kernels our
customers run with. And generally, it's better it get as much
good information as early as possible after a failure, instead
of attempting to recreate the failure with a "debug" kernel
[scheduling time on the system can sometimes be a real pain.]


CPUx is down

This should be still printed if there's a timeout, or rather print
a "CPUx is not down" message. Right now there's no timeout detection on shutdown, but
I guess that wouldn't be too hard to add.

That seems a bit outside the scope of this patch...?


-Andi

Thanks!
Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/