Re: [PATCH v3 RESEND] mm: shmem: implement POSIX_FADV_[WILL|DONT]NEED for shmem

From: Mark Hemment
Date: Wed Jan 12 2022 - 06:38:42 EST


On Mon, 10 Jan 2022 at 15:14, Charan Teja Kalla
<quic_charante@xxxxxxxxxxx> wrote:
>
> Thanks again Mark for the review comments!!
>
> On 1/10/2022 6:06 PM, Mark Hemment wrote:
> > On Thu, 6 Jan 2022 at 17:06, Charan Teja Reddy
> > <quic_charante@xxxxxxxxxxx> wrote:
> >>
> >> From: Charan Teja Reddy <charante@xxxxxxxxxxxxxx>
> >>
> >> Currently fadvise(2) is supported only for the files that doesn't
> >> associated with noop_backing_dev_info thus for the files, like shmem,
> >> fadvise results into NOP. But then there is file_operations->fadvise()
> >> that lets the file systems to implement their own fadvise
> >> implementation. Use this support to implement some of the POSIX_FADV_XXX
> >> functionality for shmem files.
> > ...
> >> +static void shmem_isolate_pages_range(struct address_space *mapping, loff_t start,
> >> + loff_t end, struct list_head *list)
> >> +{
> >> + XA_STATE(xas, &mapping->i_pages, start);
> >> + struct page *page;
> >> +
> >> + rcu_read_lock();
> >> + xas_for_each(&xas, page, end) {
> >> + if (xas_retry(&xas, page))
> >> + continue;
> >> + if (xa_is_value(page))
> >> + continue;
> >> + if (!get_page_unless_zero(page))
> >> + continue;
> >> + if (isolate_lru_page(page))
> >> + continue;
> >
> > Need to unwind the get_page on failure to isolate.
>
> Will be done.
>
> >
> > Should PageUnevicitable() pages (SHM_LOCK) be skipped?
> > (That is, does SHM_LOCK override DONTNEED?)
>
>
> Should be skipped. Will be done.
>
> >
> > ...
> >> +static int shmem_fadvise_dontneed(struct address_space *mapping, loff_t start,
> >> + loff_t end)
> >> +{
> >> + int ret;
> >> + struct page *page;
> >> + LIST_HEAD(list);
> >> + struct writeback_control wbc = {
> >> + .sync_mode = WB_SYNC_NONE,
> >> + .nr_to_write = LONG_MAX,
> >> + .range_start = 0,
> >> + .range_end = LLONG_MAX,
> >> + .for_reclaim = 1,
> >> + };
> >> +
> >> + if (!shmem_mapping(mapping))
> >> + return -EINVAL;
> >> +
> >> + if (!total_swap_pages)
> >> + return 0;
> >> +
> >> + lru_add_drain();
> >> + shmem_isolate_pages_range(mapping, start, end, &list);
> >> +
> >> + while (!list_empty(&list)) {
> >> + page = lru_to_page(&list);
> >> + list_del(&page->lru);
> >> + if (page_mapped(page))
> >> + goto keep;
> >> + if (!trylock_page(page))
> >> + goto keep;
> >> + if (unlikely(PageTransHuge(page))) {
> >> + if (split_huge_page_to_list(page, &list))
> >> + goto keep;
> >> + }
> >
> > I don't know the shmem code and the lifecycle of a shm-page, so
> > genuine questions;
> > When the try-lock succeeds, should there be a test for PageWriteback()
> > (page skipped if true)? Also, does page->mapping need to be tested
> > for NULL to prevent races with deletion from the page-cache?
>
> I failed to envisage it. I should have considered both these conditions
> here. BTW, I am just thinking about why we shouldn't use
> reclaim_pages(page_list) function here with an extra set_page_dirty() on
> a page that is isolated? It just call the shrink_page_list() where all
> these conditions are properly handled. What is your opinion here?

Should be possible to use reclaim_pages() (I haven't look closely).
It might actually be good to use this function, as will do some
congestion throttling. Although it will always try to unmap
pages (note: your page_mapped() test is 'unstable' as done without the
page locked), so might give behaviour you want to avoid.
Note: reclaim_pages() is already used for madvise(PAGEOUT). The shmem
code would need to prepare page(s) to help shrink_page_list() to make
progress (see madvise.c:madvise_cold_or_pageout_pte_range()).

Taking a step back; is fadvise(DONTNEED) really needed/wanted? Yes,
you gave a usecase (which I cut from this thread in my earlier reply),
but I'm not familiar with various shmem uses to know if this feature
is needed. Someone else will need to answer this.

Cheers,
Mark

>
> >
> > ...
> >> +
> >> + clear_page_dirty_for_io(page);
> >> + SetPageReclaim(page);
> >> + ret = shmem_writepage(page, &wbc);
> >> + if (ret || PageWriteback(page)) {
> >> + if (ret)
> >> + unlock_page(page);
> >> + goto keep;
> >> + }
> >> +
> >> + if (!PageWriteback(page))
> >> + ClearPageReclaim(page);
> >> +
> >> + /*
> >> + * shmem_writepage() place the page in the swapcache.
> >> + * Delete the page from the swapcache and release the
> >> + * page.
> >> + */
> >> + __mod_node_page_state(page_pgdat(page),
> >> + NR_ISOLATED_ANON + page_is_file_lru(page), compound_nr(page));
> >> + lock_page(page);
> >> + delete_from_swap_cache(page);
> >> + unlock_page(page);
> >> + put_page(page);
> >> + continue;
> >> +keep:
> >> + putback_lru_page(page);
> >> + __mod_node_page_state(page_pgdat(page),
> >> + NR_ISOLATED_ANON + page_is_file_lru(page), compound_nr(page));
> >> + }
> >
> > The putback_lru_page() drops the last reference hold this code has on
> > 'page'. Is it safe to use 'page' after dropping this reference?
>
> True. Will correct it in the next revision.
>
> >
> > Cheers,
> > Mark
> >