Re: [PATCH RFC] mm readahead: Fix the readahead fail in case of emptynuma node

From: Raghavendra K T
Date: Thu Dec 05 2013 - 00:50:17 EST


On 12/05/2013 03:18 AM, Andrew Morton wrote:
On Wed, 04 Dec 2013 14:38:11 +0530 Raghavendra K T <raghavendra.kt@xxxxxxxxxxxxxxxxxx> wrote:

On 12/04/2013 02:11 PM, Andrew Morton wrote:
:
: This patch takes it all out and applies the same upper limit as is used in
: sys_readahead() - half the inactive list.
:
: +/*
: + * Given a desired number of PAGE_CACHE_SIZE readahead pages, return a
: + * sensible upper limit.
: + */
: +unsigned long max_sane_readahead(unsigned long nr)
: +{
: + unsigned long active;
: + unsigned long inactive;
: +
: + get_zone_counts(&active, &inactive);
: + return min(nr, inactive / 2);
: +}


Hi Andrew, Thanks for digging out. So it seems like earlier we had not
even considered free pages?

And one would need to go back further still to understand the rationale
for the sys_readahead() decision and that even predates the BK repo.

iirc the thinking was that we need _some_ limit on readahead size so
the user can't go and do ridiculously large amounts of readahead via
sys_readahead(). But that doesn't make a lot of sense because the user
could do the same thing with plain old read().


True.

So for argument's sake I'm thinking we just kill it altogether and
permit arbitrarily large readahead:

--- a/mm/readahead.c~a
+++ a/mm/readahead.c
@@ -238,13 +238,12 @@ int force_page_cache_readahead(struct ad
}

/*
- * Given a desired number of PAGE_CACHE_SIZE readahead pages, return a
- * sensible upper limit.
+ * max_sane_readahead() is disabled. It can later be removed altogether, but
+ * let's keep a skeleton in place for now, in case disabling was the wrong call.
*/
unsigned long max_sane_readahead(unsigned long nr)
{
- return min(nr, (node_page_state(numa_node_id(), NR_INACTIVE_FILE)
- + node_page_state(numa_node_id(), NR_FREE_PAGES)) / 2);
+ return nr;
}


I had something like below in mind for posting. But it looks
simple now with your patch.


unsigned long max_sane_readahead(unsigned long nr)
{
int nid;
unsigned long free_page = 0;

for_each_node_state(nid, N_MEMORY)
free_page += node_page_state(nid, NR_INACTIVE_FILE)
+ node_page_state(nid, NR_FREE_PAGES);

/*
* Readahead onto remote memory is better than no readahead when local
* numa node does not have memory. We sanitize readahead size depending
* on potential free memory in the whole system.
*/
return min(nr, free_page / (2 * nr_node_ids));

Or if we wanted to avoid iteration on nodes simply returning

something like nr/8 or something like that for remote numa fault cases.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/