RE: [regression]: soft lockup in dmesg after suspend/resume

From: ykzhao
Date: Wed Jan 06 2010 - 01:36:34 EST


On Wed, 2010-01-06 at 14:12 +0800, Zou, Nanhai wrote:
> >>-----Original Message-----
> >>From: Zhao, Yakui
> >>Sent: 2010å1æ4æ 13:37
> >>To: mingo@xxxxxxx
> >>Cc: linux-kernel@xxxxxxxxxxxxxxx; Zou, Nanhai; Pallipadi, Venkatesh
> >>Subject: [regression]: soft lockup in dmesg after suspend/resume
> >>
> >>Hi,
> >> My box can work well before suspend/resume. But it will complain the
> >>following warning message after suspend/resume.
> >> >[1266874868.022103] Enabling non-boot CPUs ...
> >>[1266874868.022198] BUG: soft lockup - CPU#0 stuck for 0s! [kthreadd:2]
> >>
> >> At the same time after I add the boot option of "printk.time=1", I
> >>find that the log time is changed spontaneously from 76 to 1266874868.
> >> > [ 76.475266] CPU3 is down
> >>[ 76.475312] Extended CMOS year: 2000
> >>[1266874868.020631] x86 PAT enabled: cpu 0, old 0x7040600070406, new
> >>0x7010600070106
> >>[1266874868.021779] Back to C!
> >>[1266874868.022003] CPU0: Thermal LVT vector (0xfa) already installed
> >>[1266874868.022060] Extended CMOS year: 2000
> >>
> >> More detailed info can be found in the attached file of
> >>dmesg_after_origin.
> >>
> >> After I look at the source code, I find that on this box the TSC runs
> >>at constant rate with P/T states and does not stop in deep C-states. And
> >>then the sched_clock_stable is set to 1. In such case the TSC time is
> >>used directly in the function of sched_clock_cpu.
> >>
> >> Then I do another test on this box, in which the clock_stable flag is
> >>saved/restored in course of suspend/resume(I add this by using per-cpu
> >>structure). When entering the suspended state, the clock_stable will be
> >>cleared. And when the system is resumed, the clock_stable will be set
> >>again. But unfortunately the soft lockup still exists. The file of
> >>dmesg_after_test2 is the dmesg log after I save/restore the clock_stable
> >>flag in course of suspend/resume.
> >>
> >> How about clearing the sched_clock_stable flag even when TSC doesn't
> >>stop in deep C-state? From my test it seems that the TSC value is
> >>unknown after doing suspend/resume.
> >>
> >>Thanks.
> >> Yakui.
>
> Hi Ingo,
> How do you think about this bug?
> This is introduced by the sched_clock_stable flag,
> TSC is stable except when CPU is suspending, we see suspend/resume hang on those machines.

It is not suspend/resume hang. The main issue is that the kernel
will complain the soft lockup warning message after suspend/resume. And
when adding the boot option of "printk.time=1", we find that the dmesg
log time will be changed spontaneously after suspend/resume.

thanks.
Yakui

> Maybe we can ignore sched_clock_stable flag when CPU is suspending?
>
> Thanks
> Zou Nan hai
>
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/