Re: [PATCH]: Clean up of __alloc_pages

From: Paul Jackson
Date: Sat Oct 29 2005 - 19:17:51 EST


Seth wroteL
> @@ -851,19 +853,11 @@
> * Ignore cpuset if GFP_ATOMIC (!wait) rather than fail alloc.
> * See also cpuset_zone_allowed() comment in kernel/cpuset.c.
> */
> - for (i = 0; (z = zones[i]) != NULL; i++) {
> - if (!zone_watermark_ok(z, order, z->pages_min,
> - classzone_idx, can_try_harder,
> - gfp_mask & __GFP_HIGH))
> - continue;
> -
> - if (wait && !cpuset_zone_allowed(z, gfp_mask))
> - continue;
> -
> - page = buffered_rmqueue(z, order, gfp_mask);
> - if (page)
> - goto got_pg;
> - }
> + if (!wait)
> + page = get_page_from_freelist(gfp_mask, order, zones,
> + can_try_harder);

Thanks for the clean-up work. Good stuff.

I think you've changed the affect that the cpuset check has on the
above pass.

As you know, the above is the last chance we have for GFP_ATOMIC (can't
wait) allocations before getting into the oom_kill code. The code had
been written to ignore cpuset constraints for GFP_ATOMIC (that is,
"!wait") allocations. The intent is to allow taking GFP_ATOMIC memory
from any damn node we can find it on, rather than start killing.

Your change will call into get_page_from_freelist() in such cases,
where the cpuset check is still done.

I would be tempted instead to:
1) pass 'can_try_harder' value of -1, instead of the the local value
of 1 (which it certainly is, since we are in !wait code).
2) condition the cpuset check in get_page_from_freelist() on
can_try_harder being not equal to -1.

The item (2) -does- change the existing cpuset conditions as well,
allowing cpuset boundaries to be violated for the cases that would
"allow future memory freeing" (such as GFP_MEMALLOC or TIF_MEMDIE),
whereas until now, we did not allow violating cpuset conditions
for this. But that is arguably a good change.

The following patch, on top of yours, shows what I have in mind here:

--- 2.6.14-rc5-mm1.orig/mm/page_alloc.c 2005-10-29 14:45:07.000000000 -0700
+++ 2.6.14-rc5-mm1/mm/page_alloc.c 2005-10-29 16:35:55.000000000 -0700
@@ -777,7 +777,7 @@ get_page_from_freelist(unsigned int __no
* See also cpuset_zone_allowed() comment in kernel/cpuset.c.
*/
for (i = 0; (z = zones[i]) != NULL; i++) {
- if (!cpuset_zone_allowed(z, gfp_mask))
+ if (can_try_harder != -1 && !cpuset_zone_allowed(z, gfp_mask))
continue;

if ((can_try_harder >= 0) &&
@@ -940,8 +940,7 @@ restart:
* See also cpuset_zone_allowed() comment in kernel/cpuset.c.
*/
if (!wait)
- page = get_page_from_freelist(gfp_mask, order, zones,
- can_try_harder);
+ page = get_page_from_freelist(gfp_mask, order, zones, -1);
if (page)
goto got_pg;


However ...
1) The above also would change __GFP_HIGH and rt allocations to also
ignore mins entirely, instead of just going deeper into reserves,
on this pass. That is likely not good.
2) I can't get my head wrapped around Nick's reply to this patch.

So my above patch is no doubt flawed in one or more ways.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@xxxxxxx> 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/