Re: Early test: hangs in mm/compact.c w. Linus's 12d7aacab56e9ef185c

From: P. Christeas
Date: Tue Nov 04 2014 - 04:51:52 EST


On Tuesday 04 November 2014, Vlastimil Babka wrote:
> Please do keep testing (and see below what we need), and don't try
> another tree - it's 3.18 we need to fix!
Let me apologize/warn you about the poor quality of this report (and debug
data).
It is on a system meant for everyday desktop usage, not kernel development.
Thus, it is tuned to be "slightly" debuggable ; mostly for performance.

> I'm not sure what you mean by "race" here and your snippet is
> unfortunately just a small portion of the output ...

It is a shot in the dark. System becomes non-responsive (narrowed to desktop
apps waiting each other, or the X+kwin blocking), I can feel the CPU heating
and /sometimes/ disk I/O.

No BUG, Oops or any kernel message. (is printk level 4 adequate? )

Then, I try to drop to a console and collect as much data as possible with
SysRq.

The snippet I'd sent you is from all-cpus-backtrace (l), trying to see which
traces appear consistently during the lockup. There is also the huge traces of
"task-states" (t), but I reckon they are too noisy.
That trace also matches the usage profile, because AFAICG[uess] the issue
appears when allocating during I/O load.

After turning on full-preemption, I have been able to terminate/kill all tasks
and continue with same kernel but new userspace.

> OK so the process is not dead due to the problem? That probably rules
> out some kinds of errors but we still need the full output. Thanks in
> advance.
> I'm not aware of this, CCing lkml for wider coverage.

Thank you. As I've told in the first mail, this is an early report of possible
3.18 regression. I'm trying to narrow down the case and make it reproducible
or get a good trace.

Attached is my current .config


Attachment: config-3.18.gz
Description: application/gzip