Re: Intel i7/X 980 freezes with CONFIG_INTEL_IDLE and frequency scaling

From: Carsten Emde
Date: Mon Apr 04 2011 - 10:30:33 EST


On 04/02/2011 09:33 AM, Carsten Emde wrote:
after upgrading a Fedora 14 based Intel i7/X 980 box to 2.6.39-rc1, the
system freezes when frequency scaling is started. Looks like a complete
CPU halt, since neither SysRq nor anything else can convince the box to
provide any information.
Loading acpi_cpufreq is save, but selecting the performance
scaling_governor immediately freezes the system. Selecting the ondemand
scaling_governor freezes the system when load is generated for the first
time, presumably when the frequency is increased. Fortunately, when the
kernel was built with another config file, the system did not freeze.
After enabling and disabling some suspicious config items (before going
through the hassle and bisect the config using ktest.pl), the culprit
was found to be CONFIG_INTEL_IDLE. Fedora 14 enables it.
While digging deeper into this mystery, I found a detail that may connect this problem to the poweroff problem (https://lkml.org/lkml/2011/3/31/458).

The CPU (i7/X 980 box) apparently freezes when it executes

__monitor(&current_thread_info()->flags, 0, 0);

which is

static inline void __monitor(const void *eax, unsigned long ecx,
unsigned long edx)
{
/* "monitor %eax, %ecx, %edx;" */
asm volatile(".byte 0x0f, 0x01, 0xc8;"
:: "a" (eax), "c" (ecx), "d"(edx));
}

After removing this line from the execution path (see below patch), the problem is gone and the system no longer freezes when processor frequency scaling is enabled. I double-checked this, so I am pretty sure that this is, in fact, the case.

Interestingly, the __monitor() command is also used in mwait_play_dead() at arch/x86/kernel/smpboot.c where the poweroff problem is originating. Maybe, a misbehavior of this instruction also causes the poweroff problem.

But we should solve this problem first. Len or anybody else at Intel, I don't think that I can go any further without your help. Do you have any explanation what is going on here when the processor

vendor_id : GenuineIntel
cpu family : 6
model : 44
model name : Intel(R) Core(TM) i7 CPU X 980 @ 3.33GHz
stepping : 2

encounters the above instruction?

Thanks,
-Carsten.


Index: linux-2.6.39-rc1/drivers/idle/intel_idle.c
===================================================================
--- linux-2.6.39-rc1.orig/drivers/idle/intel_idle.c
+++ linux-2.6.39-rc1/drivers/idle/intel_idle.c
@@ -99,11 +99,54 @@ static unsigned long long auto_demotion_
*/
#define CPUIDLE_FLAG_TLB_FLUSHED 0x10000

+static int intel_idle_monitor(struct cpuidle_device *dev,
+ struct cpuidle_state *state, int monitor);
+
+static int intel_idle_westmere(struct cpuidle_device *dev,
+ struct cpuidle_state *state)
+{
+ return intel_idle_monitor(dev, state, 0);
+}
+
+static int intel_idle(struct cpuidle_device *dev,
+ struct cpuidle_state *state)
+{
+ return intel_idle_monitor(dev, state, 1);
+}
+
/*
* States are indexed by the cstate number,
* which is also the index into the MWAIT hint array.
* Thus C0 is a dummy.
*/
+static struct cpuidle_state westmere_cstates[MWAIT_MAX_NUM_CSTATES] = {
+ { /* MWAIT C0 */ },
+ { /* MWAIT C1 */
+ .name = "C1-NHM",
+ .desc = "MWAIT 0x00",
+ .driver_data = (void *) 0x00,
+ .flags = CPUIDLE_FLAG_TIME_VALID,
+ .exit_latency = 3,
+ .target_residency = 6,
+ .enter = &intel_idle_westmere },
+ { /* MWAIT C2 */
+ .name = "C3-NHM",
+ .desc = "MWAIT 0x10",
+ .driver_data = (void *) 0x10,
+ .flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
+ .exit_latency = 20,
+ .target_residency = 80,
+ .enter = &intel_idle_westmere },
+ { /* MWAIT C3 */
+ .name = "C6-NHM",
+ .desc = "MWAIT 0x20",
+ .driver_data = (void *) 0x20,
+ .flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
+ .exit_latency = 200,
+ .target_residency = 800,
+ .enter = &intel_idle_westmere },
+};
+
static struct cpuidle_state nehalem_cstates[MWAIT_MAX_NUM_CSTATES] = {
{ /* MWAIT C0 */ },
{ /* MWAIT C1 */
@@ -212,7 +255,8 @@ static struct cpuidle_state atom_cstates
* @state: cpuidle state
*
*/
-static int intel_idle(struct cpuidle_device *dev, struct cpuidle_state *state)
+static int intel_idle_monitor(struct cpuidle_device *dev,
+ struct cpuidle_state *state, int monitor)
{
unsigned long ecx = 1; /* break on interrupt flag */
unsigned long eax = (unsigned long)cpuidle_get_statedata(state);
@@ -239,8 +283,8 @@ static int intel_idle(struct cpuidle_dev

stop_critical_timings();
if (!need_resched()) {
-
- __monitor((void *)&current_thread_info()->flags, 0, 0);
+ if (monitor)
+ __monitor((void *)&current_thread_info()->flags, 0, 0);
smp_mb();
if (!need_resched())
__mwait(eax, ecx);
@@ -338,12 +382,17 @@ static int intel_idle_probe(void)
case 0x2E: /* Nehalem-EX Xeon */
case 0x2F: /* Westmere-EX Xeon */
case 0x25: /* Westmere */
- case 0x2C: /* Westmere */
cpuidle_state_table = nehalem_cstates;
auto_demotion_disable_flags =
(NHM_C1_AUTO_DEMOTE | NHM_C3_AUTO_DEMOTE);
break;

+ case 0x2C: /* Special Westmere? */
+ cpuidle_state_table = westmere_cstates;
+ auto_demotion_disable_flags =
+ (NHM_C1_AUTO_DEMOTE | NHM_C3_AUTO_DEMOTE);
+ break;
+
case 0x1C: /* 28 - Atom Processor */
cpuidle_state_table = atom_cstates;
break;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/