Re: [PATCH] kthread: NUMA aware kthread_create_on_cpu()

From: Eric Dumazet
Date: Sun Nov 28 2010 - 17:52:14 EST


Le dimanche 28 novembre 2010 Ã 23:40 +0100, Andi Kleen a Ãcrit :
> On Sun, Nov 28, 2010 at 08:33:53PM +0100, Eric Dumazet wrote:
> > @@ -101,7 +103,15 @@ static int kthread(void *_create)
> > static void create_kthread(struct kthread_create_info *create)
> > {
> > int pid;
> > -
> > + static int last_cpu_pref = -1;
> > +
> > + if (create->cpu != last_cpu_pref) {
>
> Is that actually thread-safe?

Yes, we use one dedicated task to create all kthreads.

This task runs kthreadd(void *unused) in kernel/kthread.c

This only duty is to create tasks.


>
> > +void numa_cpubind_policy(int cpu)
> > +{
> > + nodemask_t mask;
> > +
> > + init_nodemask_of_node(&mask, cpu_to_node(cpu));
> > + do_set_mempolicy(MPOL_BIND, 0, &mask);
>
> You don't want bind, you want preferred, otherwise this
> will explode if the node is empty.
>

OK thanks, I'll test the patch with BIND or PREFERRED on x86_32 mode
since I have one machine with two sockets, 2GB on each socket, so 2nd
node only have HIGHMEM, no LOWMEM.

> Also this messes up the policy of the caller process. You really
> need to save/restore it.

Well, caller process duty is to create kthreads in a loop.

>
> And if the slab is configured for slab interleaving in
> the cpuset this will be ignored I think.
>



> Also I think the slab fast path ignores the policy anyways,
> the policy only acts when slab has to grab new pages.
> Are you sure this works at all?
>

It works on x86 at least, I tested this patch and got correct stacks for
pktgen and ksoftirqd kthreads for sure.

> It would be probably better to pass through the node
> to the low level allocation functions and use them
> there directly.
>

It would be difficult, because do_fork() is arch dependant

> Problem is that this ends up in architecture specific code
> for the stack, so may be a larger patch.

I suggest arches that need slab to allocate kthread stacks do the
appropriate changes, because I am not able to make them myself.

On x86, we use page allocator only, so NUMA mempolicy is used.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/