Re: [PATCH 3/3] mm: when handling percpu_pagelist_fraction, use on_each_cpu()to set percpu pageset fields.

From: Cody P Schafer
Date: Tue Apr 09 2013 - 15:28:01 EST


On 04/08/2013 11:06 PM, Gilad Ben-Yossef wrote:
On Tue, Apr 9, 2013 at 9:03 AM, Gilad Ben-Yossef <gilad@xxxxxxxxxxxxx> wrote:


I also wonder whether there could be unexpected interactions between ->high
and ->batch not changing together atomically. For example, could adjusting
this knob cause ->batch to rise enough that it is greater than the previous
->high? If the code above then runs with the previous ->high, ->count
wouldn't be correct (checking this inside free_pcppages_bulk() might help on
this one issue).

You are right, but that can be treated in setup_pagelist_highmark() e.g.:

3993 static void setup_pagelist_highmark(struct per_cpu_pageset *p,
3994 unsigned long high)
3995 {
3996 struct per_cpu_pages *pcp;
unsigned int batch;
3997
3998 pcp = &p->pcp;
/* We're about to mess with PCP in an non atomic fashion.
Put an intermediate safe value of batch and make sure it
is visible before any other change */
pcp->batch = 1UL;
smb_mb();

3999 pcp->high = high;

and i think I missed another needed barrier here:
smp_mb();


4000 batch = max(1UL, high/4);
4001 if ((high/4) > (PAGE_SHIFT * 8))
4002 batch = PAGE_SHIFT * 8;

pcp->batch = batch;
4003 }



Yep, that appears to work, provided no additional users of ->batch and ->high show up. It seems we'll also need some locking to prevent concurrent updaters, but that is relatively light weight.

I'll roll up a new patchset that uses this methodology.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/