Re: 2.6 vs 2.4 regression when running gnomemeeting

From: Con Kolivas
Date: Sun Dec 21 2003 - 20:48:56 EST


On Mon, 22 Dec 2003 12:19, Christian Meder wrote:
> On Sun, 2003-12-21 at 09:57, Ingo Molnar wrote:
> > * Christian Meder <chris@xxxxxxxxxxxxxxx> wrote:
> > > I tried to verify your suggestion and found that the P_RTEMS symbol is
> > > not defined on Linux. It seems to be some other kind of realtime
> > > operating system. So the code in question already uses usleep. Now I'm
> > > still digging for other occurances of sched_yield in the pwlib
> > > sources.
> >
> > could you try to strace -f gnomemeeting? Maybe there's no sched_yield()
> > at all. Could you also try to run the non-yielding loop code via:
> >
> > nice -19 ./loop &
> >
> > do a couple of such loops still degrade gnomemeeting?
>
> I found the culprit. It's sched_yield again. When I straced gnomemeeting
> even without load I saw a lot of sched_yields. So I googled around for
> 2.6 and sched_yield and found among others
> http://www.hpl.hp.com/research/linux/kernel/o1-openmp.php by David
> Mosberger. I tried gnomemeeting with the romp hack at the end of the
> article which changes all sched_yields to noops via library preloading.
> The difference was _really_ impressive. No matter how many non-yield
> loops and kernel compiles I ran gnomemeeting didn't even skip once.

It's a valuable lesson to us that the sched_yield() issue is one we should be
suspicious of. I'm very happy that it performs well for you once the coding
is corrected.

[SNIP]

> Defining Yield to noop and building a new libpt solved the problem
> permanently for me.
>
> It seems that not all people have got problems with gnomemeeting and
> 2.6. Damien Sandras (the gnomemeeting maintainer) for example reported
> that he hasn't got any problems with gnomemeeting on 2.6 while compiling
> in parallel. So I guess it's depending on the frequency of sched_yields
> one gets in gnomemeeting. Which is probably depending on the processor
> speed, etc.

It probably hits that code path less frequently in slower cpus ironically.

> That just leaves the question what is the proper fix, to send it
> upstream and to note the phenomenon down in a faq.

The sched_yield issue is noted in faqs, but it should be highlighted in the
interactivity section.

> Thanks to all who helped me with debugging advice and if anybody needs
> further information just ask.

No, thank you for tracking this down fully. No amount of reassuring that it's
not 2.6's fault will suffice for us unless we can find the real cause and
solutions for these issues.

Regards,
Con

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/