rdmsr_safe_on_cpu hangs?

From: Dan Upton
Date: Tue Feb 26 2008 - 16:58:48 EST


I'm seeing this behavior in both 2.6.23.14 and 2.6.24.3, on x86-64 on
a Core2 Duo. Where I'm working on temperature-based scheduling, I've
added a few places that basically duplicate the calls to rdmsr_on_cpu
from hwmon/coretemp.c to places in sched.c and sched_debug.c. All of
the instances in sched_debug.c are of course only accessed once the
system has booted all the way, and I haven't run into any problems
reading (and getting correct values) like that. When I saw
rdmsr_on_cpu hang, I switched to using rdmsr_safe_on_cpu. I thought
that was supposed to fail gracefully, but it still seems to be
hanging. I have two different problems:

-In the 2.6.23.14 kernel, I was trying to read via a function called
from sched_balance_self. It seems to work fine until it becomes aware
of the second core (ie, rdmsr_safe_on_cpu(0, IA32_THERM_STATUS, &eax,
&edx) works fine, but rdmsr_safe_on_cpu(1, ...) never returns).
-In the 2.6.24.3 kernel, it works fine when I call it from
sched_balance_self. I added another place to call the function from
prepare_task_switch, so I could save some relevant information before
swapping the task away, and it eventually hangs reading on core
0--obviously after "Booting the kernel", but before "Red Hat nash"
starting.

I guess the question is, am I just misunderstanding the use of
rdmsr_safe_on_cpu, or is it an issue with that particular MSR (some of
the stuff I've read indicates that rdmsr_safe was really only
implemented as a prequel to the coretemp driver), or is it something
wrong with rdmsr_safe_on_cpu?

-dan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/