Re: 2.6.21-rc5: maxcpus=1 crash in cpufreq: kernel BUG at drivers/cpufreq/cpufreq.c:82!

From: Venki Pallipadi
Date: Mon Mar 26 2007 - 14:11:00 EST



On Mar 26, 2007, at 2:04 AM, Ingo Molnar wrote:


trying to debug the forcedeth crash triggered another, new
v2.6.20 -> v2.6.21 regression:

maxcpus=1 on a dual-core system crashes the x86_64 SMP kernel in
lock_policy_rwsem_write() - see the crash log below. Config attached.

i suspect it could be related to this recent commit:

commit 5a01f2e8f3ac134e24144d74bb48a60236f7024d
Author: Venkatesh Pallipadi <venkatesh.pallipadi@xxxxxxxxx>
Date: Mon Feb 5 16:12:44 2007 -0800

[CPUFREQ] Rewrite lock in cpufreq to eliminate cpufreq/hotplug related issue

Ingo
Calling initcall 0xffffffff8021e003: powernowk8_init+0x0/0x88()
powernow-k8: Found 1 AMD Athlon(tm) 64 X2 Dual Core Processor 3800+ processors (version 2.00.00)
powernow-k8: BIOS error - no PSB or ACPI _PSS objects
------------[ cut here ]------------
kernel BUG at drivers/cpufreq/cpufreq.c:82!
invalid opcode: 0000 [1] PREEMPT SMP
CPU 0
Modules linked in:
Pid: 1, comm: swapper Not tainted 2.6.21-rc5 #14
RIP: 0010:[<ffffffff804a0edd>] [<ffffffff804a0edd>] lock_policy_rwsem_write+0x2b/0x7f
RSP: 0000:ffff81003ff31de0 EFLAGS: 00010246
RAX: 00000000ffffffff RBX: ffffffff80aa46a8 RCX: 0000000000000000
RDX: ffffffff80741080 RSI: 0000000000000202 RDI: 0000000000000001
RBP: ffff81003ff31e00 R08: ffff81003ff30000 R09: ffff810002dfdab0
R10: ffff810002dfdab0 R11: ffff810002dfdab0 R12: 0000000000000001
R13: ffffffff806cc2c0 R14: ffffffff80a37920 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffffffff80741000(0000) knlGS: 0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000f06f53 CR3: 0000000000201000 CR4: 00000000000006e0
Process swapper (pid: 1, threadinfo ffff81003ff30000, task ffff81003ff2a790)
Stack: ffffffff806cc2c0 ffffffff80aa46a8 ffffffff80aa46a8 ffffffff806cc2c0
ffff81003ff31e20 ffffffff804a1538 ffff810002dfdab0 ffffffff806f6a40
ffff81003ff31e50 ffffffff803dffdf 00000000ffffffed 00000000ffffffed
Call Trace:
[<ffffffff804a1538>] cpufreq_remove_dev+0x11/0x26
[<ffffffff803dffdf>] sysdev_driver_unregister+0x52/0x98
[<ffffffff804a0d3b>] cpufreq_register_driver+0x12d/0x191
[<ffffffff8021e082>] powernowk8_init+0x7f/0x88
[<ffffffff80a1099e>] init+0x1a8/0x2a9
[<ffffffff8023294e>] schedule_tail+0x36/0x9a
[<ffffffff8020ab98>] child_rip+0xa/0x12
[<ffffffff80a107f6>] init+0x0/0x2a9
[<ffffffff8020ab8e>] child_rip+0x0/0x12


Code: 0f 0b eb fe 4c 63 e8 48 c7 c3 a0 c1 a6 80 4a 8b 04 ed c0 7a
RIP [<ffffffff804a0edd>] lock_policy_rwsem_write+0x2b/0x7f
RSP <ffff81003ff31de0>
Kernel panic - not syncing: Attempted to kill init!


Looks like some error with cpufreq add_dev and remove_dev got exposed by the above patch.

The problem here is:

cpufreq_register_driver() calls sysdev_driver_register()
which in turn calls add() for the CPU already present and ignores the return value from that add().
(drivers/base/sys.c:183).
Not if that add had failed, cpufreq will not know and later a remove () is being called on the same device, which causes the BUG() condition here.

However, I am not sure at this point why this gets triggered only on numcpus=1 case and not on normal boot case.

Will dig into this a bit more and hopefully have a patch to fix this soon.

Thanks,
~Venki

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/