Re: MADV_DONTNEED semantics? Was: [RFC PATCH] mm: madvise: Ignore repeated MADV_DONTNEED hints

From: Minchan Kim
Date: Wed Feb 04 2015 - 20:08:19 EST


Hello,

On Wed, Feb 04, 2015 at 08:24:27PM +0100, Michael Kerrisk (man-pages) wrote:
> On 4 February 2015 at 18:02, Vlastimil Babka <vbabka@xxxxxxx> wrote:
> > On 02/04/2015 03:00 PM, Michael Kerrisk (man-pages) wrote:
> >>
> >> Hello Vlastimil,
> >>
> >> On 4 February 2015 at 14:46, Vlastimil Babka <vbabka@xxxxxxx> wrote:
> >>>>>
> >>>>> - that covers mlocking ok, not sure if the rest fits the "shared pages"
> >>>>> case
> >>>>> though. I dont see any check for other kinds of shared pages in the
> >>>>> code.
> >>>>
> >>>>
> >>>> Agreed. "shared" here seems confused. I've removed it. And I've
> >>>> added mention of "Huge TLB pages" for this error.
> >>>
> >>>
> >>> Thanks.
> >>
> >>
> >> I also added those cases for MADV_REMOVE, BTW.
> >
> >
> > Right. There's also the following for MADV_REMOVE that needs updating:
> >
> > "Currently, only shmfs/tmpfs supports this; other filesystems return with
> > the error ENOSYS."
> >
> > - it's not just shmem/tmpfs anymore. It should be best to refer to
> > fallocate(2) option FALLOC_FL_PUNCH_HOLE which seems to be (more) up to
> > date.
> >
> > - AFAICS it doesn't return ENOSYS but EOPNOTSUPP. Also neither error code is
> > listed in the ERRORS section.
>
> Yup, I recently added that as well, based on a patch from Jan Chaloupka.
>
> >>>>>>> - The word "will result" did sound as a guarantee at least to me. So
> >>>>>>> here it
> >>>>>>> could be changed to "may result (unless the advice is ignored)"?
> >>>>>>
> >>>>>> It's too late to fix documentation. Applications already depends on
> >>>>>> the
> >>>>>> beheviour.
> >>>>>
> >>>>> Right, so as long as they check for EINVAL, it should be safe. It
> >>>>> appears
> >>>>> that
> >>>>> jemalloc does.
> >>>>
> >>>> So, first a brief question: in the cases where the call does not error
> >>>> out,
> >>>> are we agreed that in the current implementation, MADV_DONTNEED will
> >>>> always result in zero-filled pages when the region is faulted back in
> >>>> (when we consider pages that are not backed by a file)?
> >>>
> >>> I'd agree at this point.
> >>
> >> Thanks for the confirmation.
> >>
> >>> Also we should probably mention anonymously shared pages (shmem). I think
> >>> they behave the same as file here.
> >>
> >> You mean tmpfs here, right? (I don't keep all of the synonyms straight.)
> >
> > shmem is tmpfs (that by itself would fit under "files" just fine), but also
> > sys V segments created by shmget(2) and also mappings created by mmap with
> > MAP_SHARED | MAP_ANONYMOUS. I'm not sure if there's a single manpage to
> > refer to the full list.
>
> So, how about this text:
>
> After a successful MADV_DONTNEED operation, the semanâ
> tics of memory access in the specified region are
> changed: subsequent accesses of pages in the range
> will succeed, but will result in either reloading of
> the memory contents from the underlying mapped file
> (for shared file mappings, shared anonymous mappings,
> and shmem-based techniques such as System V shared
> memory segments) or zero-fill-on-demand pages for
> anonymous private mappings.

Hmm, I'd like to clarify.

Whether it was intention or not, some of userspace developers thought
about that syscall drop pages instantly if was no-error return so that
they will see more free pages(ie, rss for the process will be decreased)
with keeping the VMA. Can we rely on it?

And we should make error section, too.
"locked" covers mlock(2) and you said you will add hugetlb. Then,
VM_PFNMAP? In that case, it fails. How can we say about VM_PFNMAP?
special mapping for some drivers?

One more thing, "The kernel is free to ignore the advice".
It conflicts "This call does not influence the semantics of the
application (except in the case of MADV_DONTNEED)" so
is it okay we can believe "The kernel is free to ingmore the advise
except MADV_DONTNEED"?

Thanks.
--
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/