Re: v2.6.21.5-rt19

From: Fernando Pablo Lopez-Lezcano
Date: Sun Jul 08 2007 - 23:53:41 EST


On Mon, 9 Jul 2007, Gabriel C wrote:
Fernando Lopez-Lezcano wrote:
On Sat, 2007-07-07 at 11:24 +0200, Ingo Molnar wrote:
* Fernando Lopez-Lezcano <nando@xxxxxxxxxxxxxxxxxx> wrote:
Changes since 2.6.21.5-rt18:
- Fixed a nasty and hard to track down slowness / boot problem on SMP
machines with CONFIG_NOHZ enabled. The problem was caused by the timer
wheel base lock held during the get_next_timer_interrupt() call in the
idle path, which eventually led to a bogus PI boosting of the idle task
and in consequence a stale wrong scheduler selection for the affected idle
task.

Kudos to Carsten Emde, who patiently and meticulously isolated the
problem and provided the traces, which allowed to identify the root cause.

Problem solution: Prevent idle task boosting

Maybe someone remember me whining about troubles with 2.6.21-rt2..18 on my Core2 T7200 laptop (fujitsu-siemens amilo i1520).

Althought I'm still with my fingers crossed, I can tell the good news are that 2.6.21.5-rt19 (and -rt20) does behave far better now on the very same box.

Yes, it works much better indeed...

Ingo: is there a place where I can read about the changes in different rtxx releases? What is new/better/fixed in rt20? (I see scheduler stuff in a diff from rt19 to rt20 but I don't really know what it means).

and rt18 was a -rt-only NOHZ fix, that bug got introduced in rt11 when CFS was merged.

i _think_ Rui might have seen two separate problems. Perhaps by the time we fixed the first problem (which Rui saw since -rt2) we introduced the other one via -rt11 - which then got fixed in -rt19.

Ahh, CFS is now part of rt, I was obviously not paying attention... I'm
really trying to provide a "stable" rt kernel for audio usage and
including another subsystem into rt is - IMHO - not going to help.
What's the chance of splitting things?

btw., we'd love to get more feedback regarding CFS. CFS is a completely new scheduler for Linux.

Then I'd rather have it separate from rt.

It has a design centered around keeping application latencies down, so it is ultimately real-time friendly, and it should also make things work better for desktop-ish and audio-ish stuff as well. (even under SCHED_OTHER)


Maybe this is CFS related? (tail of a thread in the Planet CCRMA mailing
list):

On Sun, 2007-07-08 at 15:26 -0400, Hector Centeno wrote:

Ok, so just to confirm, that 2.6.21-0182.rt19.1.fc7.ccrmart works fine
on my desktop but on my laptop it makes Firefox and Tomboy to crash.
On the same laptop using 2.6.21-0182.rt17.1.fc7.ccrmart there is no
problem.

I managed to completely hang firefox (fc7) with flash 9 installed
(unkillable even with -9).

Firefox with flash 9 does not work good , there are a lot bugs reported about ( just google ) and it hangs on vanilla or whatever other kernels as well. Not only Firefox but also Swiftfox, Opera, Epiphany etc.

The most time Firefox dies when you use flash 9 and close a window or a tab.

More tests...

The problem is the rt kernel AFAICT, this goes beyond Flash 9, way beyond:

_OpenOffice_ hangs with 2.6.21.5-rt20, works fine with stock Fedora 7 kernel. Flash 9 hangs with 2.6.21.5-rt20, works fine with the stock Fedora 7 kernel. Same machine booting different kernels, I'd say it is the kernel.

The only way out for a hung app is a reboot.

Ingo: what would be a good way to trace this? It makes the rt kernels not very usable at least on this hardware (more tests tomorrow in the CCRMA machines).

Same on 2.6.21.5-rt18 with CONFIG_NO_HZ not set.

-- Fernando
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/