Re: [RFC PATCH 2/7] printk: Simple implementation for NMI backtracing

From: Daniel Thompson
Date: Thu Mar 19 2015 - 14:48:20 EST


On 19/03/15 18:30, Peter Zijlstra wrote:
On Thu, Mar 19, 2015 at 01:39:58PM -0400, Steven Rostedt wrote:
+void printk_nmi_backtrace_complete(void)
+{
+ struct nmi_seq_buf *s;
+ int len, cpu, i, last_i;
+
+ /*
+ * Now that all the NMIs have triggered, we can dump out their
+ * back traces safely to the console.
+ */
+ for_each_possible_cpu(cpu) {
+ s = &per_cpu(nmi_print_seq, cpu);
+ last_i = 0;
+
+ len = seq_buf_used(&s->seq);
+ if (!len)
+ continue;
+
+ /* Print line by line. */
+ for (i = 0; i < len; i++) {
+ if (s->buffer[i] == '\n') {
+ print_seq_line(s, last_i, i);
+ last_i = i + 1;
+ }
+ }
+ /* Check if there was a partial line. */
+ if (last_i < len) {
+ print_seq_line(s, last_i, len - 1);
+ pr_cont("\n");
+ }
+
+ /* Wipe out the buffer ready for the next time around. */
+ seq_buf_clear(&s->seq);
+ }
+
+ clear_bit(0, &nmi_print_flag);
+ smp_mb__after_atomic();

Is this really necessary. What is the mb synchronizing?

[ Added Peter Zijlstra to confirm it's not needed ]

It surely looks suspect; and it lacks a comment, which is a clear sign
its buggy.

Now it if tries to order the accesses to the seqbuf againt the clearing
of the bit one would have expected a _before_ barrier, not an _after_.

It's nothing to do with the seqbuf since I added the seqbuf code myself but the barrier was already in the code that I copied from.

In the mainline code today it looks like this as part of the x86 code (note that call to put_cpu() in my patchset but it lives in the arch/ specific code rather than the generic code):

: /* Check if there was a partial line. */
: if (last_i < len) {
: print_seq_line(s, last_i, len - 1);
: pr_cont("\n");
: }
: }
:
: clear_bit(0, &backtrace_flag);
: smp_mb__after_atomic();
: put_cpu();
: }

The barrier was not intended to have anything to do with put_cpu() either though since the barrier was added before put_cpu() arrived:
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=554ec063982752e9a569ab9189eeffa3d96731b2

There's nothing in the commit comment explaining the barrier and I really can't see what it is for.


Daniel.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/