Re: [PATCH 4/7] sched: implement force_cpus_allowed()

From: Peter Zijlstra
Date: Tue Dec 08 2009 - 08:36:44 EST


On Tue, 2009-12-08 at 21:23 +0900, Tejun Heo wrote:
> On 12/08/2009 09:10 PM, Peter Zijlstra wrote:
> > Hotplug and deterministic are not to be used in the same sentence, its
> > an utter slow path and I'd much rather have simple code than clever code
> > there -- there's been way too many 'interesting' hotplug problems.
>
> Slow and indeterminism comes in different magnitudes.

Determinism does _not_ come in magnitudes, its a very binary property,
either it is or it is not.

As to the order of slowness for unplug, that is about maximal, its _the_
slowest path in the whole kernel.

> > Furthermore if it's objective is to cater to generic thread pools then I
> > think its an utter fail simply because it mandates strict cpu affinity,
> > that basically requires you to write a work scheduler to balance work
> > load etc.. Much easier is a simple unbounded thread pool that gets
> > balanced by the regular scheduler.
>
> The observation was that for most long running async jobs, most time
> is spent sleeping instead of burning cpu cycles and long running ones
> are relatively few compared to short ones so the strict affinity would
> be more helpful. That is the basis of whole design and why it has
> scheduler callbacks to regulate concurrency instead of creating a
> bunch of active workers and letting the scheduler take care of it.
> Works wouldn't be competing for cpu cycles.
>
> In short, the target workload is the current short works + long
> running mostly sleeping async works, which cover most of worker pools
> we have in kernel.

Ok, maybe, but that is not what I would call a generic thread pool.

So the reason we have tons of idle workqueues around are purely because
of deadlock scenarios? Or is there other crap about?

So why not start simple and only have one thread per cpu (lets call it
events/#) and run all works there. Then when you enqueue a work and
events/# is already busy with a work from anther wq, hand the work to a
global event thread which will spawn a special single shot kthread for
it, with a second exception for those reclaim wq's, for which you'll
have this rescue thread which you'll bind to the right cpu for that
work.

That should get rid of all these gazillion threads we have, preserve the
queue property and not be as invasive as your current thing.

If they're really as idle as reported you'll never need the fork-fest
you currently propose, simply because there's not enough work.

So basically, have events/# service the first non-empty cwq, when
there's more non empty cwqs spawn them single shot threads, or use a
rescue thread.

> I thought about adding an unbound pool of workers
> for cpu intensive works for completeness but I really couldn't find
> much use for that. If enough number of users would need something
> like that, we can add an anonymous pool but for now I really don't see
> the need to worry about that.

And I though I'd heard multiple parties express interesting in exactly
that, btrfs, bdi and pohmelfs come to mind, also crypto looks like one
that could actually do some work.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/