Re: [PATCH] mm: raise MemFree by reverting percpu_pagelist_fractionto 0

From: Minchan Kim
Date: Fri May 11 2012 - 05:01:38 EST


On 05/11/2012 05:38 PM, Minchan Kim wrote:

> On 05/11/2012 05:30 PM, Sasha Levin wrote:
>
>> On Fri, May 11, 2012 at 10:00 AM, Hugh Dickins <hughd@xxxxxxxxxx> wrote:
>>> Commit 93278814d359 "mm: fix division by 0 in percpu_pagelist_fraction()"
>>> mistakenly initialized percpu_pagelist_fraction to the sysctl's minimum 8,
>>> which leaves 1/8th of memory on percpu lists (on each cpu??); but most of
>>> us expect it to be left unset at 0 (and it's not then used as a divisor).
>>
>> I'm a bit confused about this, does it mean that once you set
>> percpu_pagelist_fraction to a value above the minimum, you can no
>> longer set it back to being 0?
>
>
> Unfortunately, Yes. :(
> It's rather awkward and need fix.



I didn't have a time so made quick patch to show just concept.
Not tested and Not consider carefully.
If anyone doesn't oppose, I will send formal patch which will have more beauty code.

diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index f487f25..fabc52c 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -132,7 +132,6 @@ static unsigned long dirty_bytes_min = 2 * PAGE_SIZE;
/* this is needed for the proc_dointvec_minmax for [fs_]overflow UID and GID */
static int maxolduid = 65535;
static int minolduid;
-static int min_percpu_pagelist_fract = 8;

static int ngroups_max = NGROUPS_MAX;
static const int cap_last_cap = CAP_LAST_CAP;
@@ -1214,7 +1213,6 @@ static struct ctl_table vm_table[] = {
.maxlen = sizeof(percpu_pagelist_fraction),
.mode = 0644,
.proc_handler = percpu_pagelist_fraction_sysctl_handler,
- .extra1 = &min_percpu_pagelist_fract,
},
#ifdef CONFIG_MMU
{
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a13ded1..cc2353a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5161,12 +5161,30 @@ int percpu_pagelist_fraction_sysctl_handler(ctl_table *table, int write,
ret = proc_dointvec_minmax(table, write, buffer, length, ppos);
if (!write || (ret == -EINVAL))
return ret;
- for_each_populated_zone(zone) {
- for_each_possible_cpu(cpu) {
- unsigned long high;
- high = zone->present_pages / percpu_pagelist_fraction;
- setup_pagelist_highmark(
- per_cpu_ptr(zone->pageset, cpu), high);
+
+ if (percpu_pagelist_fraction < 8 && percpu_pagelist_fraction != 0)
+ return -EINVAL;
+
+ if (percpu_pagelist_fraction != 0) {
+ for_each_populated_zone(zone) {
+ for_each_possible_cpu(cpu) {
+ unsigned long high;
+ high = zone->present_pages / percpu_pagelist_fraction;
+ setup_pagelist_highmark(
+ per_cpu_ptr(zone->pageset, cpu), high);
+ }
+ }
+ }
+ else {
+ for_each_populated_zone(zone) {
+ for_each_possible_cpu(cpu) {
+ struct per_cpu_pageset *p = per_cpu_ptr(zone->pageset, cpu);
+ unsigned long batch = zone_batchsize(zone);
+ struct per_cpu_pages *pcp;
+ pcp = &p->pcp;
+ pcp->high = 6 * batch;
+ pcp->batch = max(1UL, 1 * batch);
+ }
}
}
return 0;


--
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/