Re: [PATCH] MADVISE_FREE, THP: Fix madvise_free_huge_pmd return value after splitting

From: Huang\, Ying
Date: Thu Jun 16 2016 - 23:15:56 EST


"Huang, Ying" <ying.huang@xxxxxxxxx> writes:

> From: Huang Ying <ying.huang@xxxxxxxxx>
>
> madvise_free_huge_pmd should return 0 if the fallback PTE operations are
> required. In madvise_free_huge_pmd, if part pages of THP are discarded,
> the THP will be split and fallback PTE operations should be used if
> splitting succeeds. But the original code will make fallback PTE
> operations skipped, after splitting succeeds. Fix that via make
> madvise_free_huge_pmd return 0 after splitting successfully, so that the
> fallback PTE operations will be done.
>
> Know issues: if my understanding were correct, return 1 from
> madvise_free_huge_pmd means the following processing for the PMD should
> be skipped, while return 0 means the following processing is still
> needed. So the function should return 0 only if the THP is split
> successfully or the PMD is not trans huge. But the pmd_trans_unstable
> after madvise_free_huge_pmd guarantee the following processing will be
> skipped for huge PMD. So current code can run properly. But if my
> understanding were correct, we can clean up return code of
> madvise_free_huge_pmd accordingly.
>

This patch was tested with the below program, given some memory
pressure, the the value read back is 0 with the patch, that is, THP is
split, freed and zero page is mapped. Without the patch, the value read
back is still 1, that is, the page is not freed.

Best Regards,
Huang, Ying

------------------------------->
#include <stdio.h>
#include <string.h>
#include <errno.h>
#include <stdlib.h>
#include <sys/mman.h>

#ifndef MADV_FREE
#define MADV_FREE 8 /* free pages only if memory pressure */
#endif

#define ONE_MB (1024 * 1024)
#define THP_SIZE (2 * ONE_MB)
#define THP_MASK (THP_SIZE - 1)
#define MAP_SIZE (16 * ONE_MB)

#define ERR_EXIT_ON(cond, msg) \
do { \
int __cond_in_macro = (cond); \
if (__cond_in_macro) \
error_exit(__cond_in_macro, (msg)); \
} while (0)

void error_exit(int ret, const char *msg)
{
fprintf(stderr, "Error: %s, ret : %d, error: %s",
msg, ret, strerror(errno));
exit(1);
}

void write_and_free()
{
int ret;
void *addr;
int *pn;

addr = mmap(NULL, MAP_SIZE, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
ERR_EXIT_ON(ret, "mmap");
ret = madvise(addr, MAP_SIZE, MADV_HUGEPAGE);
ERR_EXIT_ON(ret, "advise hugepage");
pn = (int *)(((unsigned long)addr + THP_SIZE) & ~THP_MASK);
*pn = 1;
printf("map 1 THP, hit any key to free part of it: ");
fgetc(stdin);
ret = madvise(pn, ONE_MB, MADV_FREE);
ERR_EXIT_ON(ret, "advise free");
printf("freed part of THP, hit any key to get the new value: ");
fgetc(stdin);
printf("val: %d\n", *pn);
}

int main(int argc, char *argv[])
{
write_and_free();
return 0;
}