Re: [PATCHSET] percpu: generalize first chunk allocators and improve lpage NUMA support

From: Andi Kleen
Date: Wed Jul 01 2009 - 08:23:26 EST


On Wed, Jul 01, 2009 at 07:21:57PM +0900, Tejun Heo wrote:
> > using possible per cpu data I picked in current git: icmp.c
>
> I was talking about percpu allocator proper. Yeap, the major work
> would be in auditing and converting for_each_possible_cpu() users.

and testing. that's the hard part. cpu hotplug is normally not well
tested. Any code change that requires lots of new code for
it will be a problem because that code will then likely bitrot.

> > kfree(net->ipv4.icmp_sk);
> > net->ipv4.icmp_sk = NULL;
> > }
> >
> > You would need to convert that to use a CPU notifier and callbacks
> > setting up the sockets. Then make sure there are no races in all of
> > this. And get it somehow tested (where is the user base who
> > tests cpu hotplug?)
>
> Maybe it would be better to allocate percpu sockets as proper percpu
> variables.

That would be like percpu dentries.

A socket is a very complex structure. Trying to pull it out
of its normal allocators would be a bad idea.

I actually had a patch some time ago to move this one over
to callbacks, but it was already difficult, and I dropped
it.

> Initialization would still need callback mechanism tho. I
> was thinking about adding @init callback to percpu_alloc(), which
> would be much simpler than doing full cpu hotplug callback.

I'm not sure it will be actually simpler. It's not just
initializing the structure, but all the other set up to make
it known to the world.

>
> > And there is lots of similar code all over the tree
>
> For static percpu variables, it'll be mostly about converting
> for_each_possible_cpu() to for_each_used_cpu() as both allocation and
> initialization can be handled by percpu proper. For dynamic areas,

That's the easy part, but you would still need all the callbacks
for the extension case.

> allocation can be handled by percpu proper but cpus coming online
> would need more work to convert. It'll take some effort but there
> aren't too many alloc_percpu() users yet and I don't think it will be

alloc_percpu()? It affects all DEFINE_PER_CPU users. My current
tree has hundreds.

> too difficult. I wouldn't know for sure before I actually try tho.

I think it's clear that you haven't tried yet :)

I wrote quite a few per cpu callback handlers over the years and
in my experience they are all nasty code with subtle races. The problem
is that instead of having a single subset init function which
is just single threaded and doesn't need to worry about races
you now have multi threaded init, which tends to be a can of worms.

I think a far saner strategy than rewriting every user of DEFINE_PER_CPU,
ending up with lots of badly tested code is to:

- Fix the few large size percpu pigs that are problematic today to
allocate in a callback.
- Then once the per cpu data in all configurations is <200k (better
<100 in the non debug builds) again just keep pre-allocating like we always did
- Possibly adjust the vmalloc area on 32bit based on the possible
CPU map at the cost of the direct mapping, to make sure there's always enough
mapping space.

-Andi

--
ak@xxxxxxxxxxxxxxx -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/