Re: [PATCH] Thermal: Fix lockup of cpu_down()

From: Srinivas Pandruvada
Date: Tue Jul 16 2013 - 14:38:56 EST


On 07/16/2013 11:33 AM, Steven Rostedt wrote:
On Tue, 2013-07-16 at 11:19 -0700, Srinivas Pandruvada wrote:
Thanks. How did you trigger this error condition? Is it a code review or
you have some way to reproduce?
No, my tests do a cpu hotplug stress and the system would hang. I had to
bisect it to find the bug and it came to this code. What was weird is
that the module wasn't loaded. Then I ran the ftrace function tracer
stared by the kernel command line with the following:

ftrace=function ftrace_filter=get_online_cpus,put_online_cpus

and after I booted up, I ran:

cat /debug/tracing/trace | perl -e '
my @stack;
while (<>) {
if (/get_online/) {
push @stack, $_;
} elsif (/put_online/) {
pop @stack;
}
}
foreach my $line (@stack) {
print $line;
}'

And it showed that get_online_cpus() was called twice without a matching
put_online_cpu(). The strange thing was the calls had no parent
function. Which is when I realized that the module was loaded but then
failed to init, and was unloaded. Which explains why it didn't show up
in my lsmod.

Then it was just the matter of looking at all the calls to
get_online_cpu() in the commit, and it was rather obvious to what the
bug was.

With the patch applied, the lockup went away.

-- Steve
Thanks for your help in debugging and isolating.




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/