Re: High scheduler wake up times

From: Shawn Bohrer
Date: Sat Jan 30 2010 - 19:46:45 EST


On Sat, Jan 30, 2010 at 06:35:49PM -0600, Shawn Bohrer wrote:
> On Sat, Jan 30, 2010 at 04:11:14PM -0800, Arjan van de Ven wrote:
> > On Sat, 30 Jan 2010 17:45:51 -0600
> > Shawn Bohrer <shawn.bohrer@xxxxxxxxx> wrote:
> >
> > > Hello,
> > >
> > > Currently we have a workload that depends on around 50 processes that
> > > wake up 1000 times a second do a small amount of work and go back to
> > > sleep. This works great on RHEL 5 (2.6.18-164.6.1.el5), but on recent
> > > kernels we are unable to achieve 1000 iterations per second. Using
> > > the simple test application below on RHEL 5 2.6.18-164.6.1.el5 I can
> > > run 500 of these processes on and still achieve 999.99 iterations per
> > > second. Running just 10 of these processes on the same machine with
> > > 2.6.32.6 produces results like:
> > > ]
> >
> > there's an issue with your expectation btw.
> > what your application does, in practice is
> >
> > <wait 1 millisecond>
> > <do a bunch of work>
> > <wait 1 millisecond>
> > <do a bunch of work>
> > etc
> >
> > you would only be able to get close to 1000 per second if "bunch of
> > work" is nothing.....but it isn't.
> > so lets assume "bunch of work" is 100 microseconds.. the basic period
> > of your program (ignoring any costs/overhead in the implementation)
> > is 1.1 milliseconds, which is approximately 909 per second, not 1000!
> >
> > I suspect that the 1000 you get on RHEL5 is a bug in the RHEL5 kernel
> > where it gives you a shorter delay than what you asked for; since it's
> > clearly not a correct number to get.
> >
> > (and yes, older kernels had such rounding bugs, current kernels go
> > through great length to give applications *exactly* the delay they are
> > asking for....)
>
> I agree that we are currently depending on a bug in epoll. The epoll
> implementation currently rounds up to the next jiffie, so specifying a
> timeout of 1 ms really just wakes the process up at the next timer tick.
> I have a patch to fix epoll by converting it to use
> schedule_hrtimeout_range() that I'll gladly send, but I still need a way
> to achieve the same thing.

I guess I should add that I think we could achieve the same effect by
adding a 1 ms (or less) periodic timerfd to our epoll set. However, it
still appears that newer kernels have a much larger scheduler delay and
I still need a way to fix that in order for us to move to a newer
kernel.

--
Shawn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/