Re: [PATCH -mm 16/25] SHM_LOCKED pages are non-reclaimable

From: Andrew Morton
Date: Fri Jun 06 2008 - 21:09:04 EST


On Fri, 06 Jun 2008 16:28:54 -0400
Rik van Riel <riel@xxxxxxxxxx> wrote:

> From: Lee Schermerhorn <Lee.Schermerhorn@xxxxxx>
>
> Against: 2.6.26-rc2-mm1
>
> While working with Nick Piggin's mlock patches,

Change log refers to information which its reader has not got a hope
of actually locating.

> I noticed that
> shmem segments locked via shmctl(SHM_LOCKED) were not being handled.
> SHM_LOCKed pages work like ramdisk pages

Well, OK. As long as one remembers that "ramdisk pages" are different
from "pages of a file which is on ramdisk". Tricky, huh?

> --the writeback function
> just redirties the page so that it can't be reclaimed. Deal with
> these using the same approach as for ram disk pages.
>
> Use the AS_NORECLAIM flag to mark address_space of SHM_LOCKed
> shared memory regions as non-reclaimable. Then these pages
> will be culled off the normal LRU lists during vmscan.

So I guess there's more justification for handling these pages in this
manner, because someone could come along later and unlock them. But
that isn't true of /dev/ram0 pages and ramfs pages, etc.

> Add new wrapper function to clear the mapping's noreclaim state
> when/if shared memory segment is munlocked.
>
> Add 'scan_mapping_noreclaim_page()' to mm/vmscan.c to scan all
> pages in the shmem segment's mapping [struct address_space] for
> reclaimability now that they're no longer locked. If so, move
> them to the appropriate zone lru list. Note that
> scan_mapping_noreclaim_page() must be able to sleep on page_lock(),
> so we can't call it holding the shmem info spinlock nor the shmid
> spinlock. So, we pass the mapping [address_space] back to shmctl()
> on SHM_UNLOCK for rescuing any nonreclaimable pages after dropping
> the spinlocks. Once we drop the shmid lock, the backing shmem file
> can be deleted if the calling task doesn't have the shm area
> attached. To handle this, we take an extra reference on the file
> before dropping the shmid lock and drop the reference after scanning
> the mapping's noreclaim pages.
>
>
> ...
>
> +
> +/**
> + * check_move_noreclaim_page - check page for reclaimability and move to appropriate zone lru list
> + * @page: page to check reclaimability and move to appropriate lru list
> + * @zone: zone page is in
> + *
> + * Checks a page for reclaimability and moves the page to the appropriate
> + * zone lru list.
> + *
> + * Restrictions: zone->lru_lock must be held, page must be on LRU and must
> + * have PageNoreclaim set.
> + */
> +static void check_move_noreclaim_page(struct page *page, struct zone *zone)
> +{
> +
> + ClearPageNoreclaim(page); /* for page_reclaimable() */

Confused. Didn't we just lose track of our NR_NORECLAIM accounting?

> + if (page_reclaimable(page, NULL)) {
> + enum lru_list l = LRU_INACTIVE_ANON + page_file_cache(page);
> + __dec_zone_state(zone, NR_NORECLAIM);
> + list_move(&page->lru, &zone->list[l]);
> + __inc_zone_state(zone, NR_INACTIVE_ANON + l);
> + } else {
> + /*
> + * rotate noreclaim list
> + */
> + SetPageNoreclaim(page);
> + list_move(&page->lru, &zone->list[LRU_NORECLAIM]);
> + }
> +}
> +
> +/**
> + * scan_mapping_noreclaim_pages - scan an address space for reclaimable pages
> + * @mapping: struct address_space to scan for reclaimable pages
> + *
> + * Scan all pages in mapping. Check non-reclaimable pages for
> + * reclaimability and move them to the appropriate zone lru list.
> + */
> +void scan_mapping_noreclaim_pages(struct address_space *mapping)
> +{
> + pgoff_t next = 0;
> + pgoff_t end = (i_size_read(mapping->host) + PAGE_CACHE_SIZE - 1) >>
> + PAGE_CACHE_SHIFT;
> + struct zone *zone;
> + struct pagevec pvec;
> +
> + if (mapping->nrpages == 0)
> + return;
> +
> + pagevec_init(&pvec, 0);
> + while (next < end &&
> + pagevec_lookup(&pvec, mapping, next, PAGEVEC_SIZE)) {
> + int i;
> +
> + zone = NULL;
> +
> + for (i = 0; i < pagevec_count(&pvec); i++) {
> + struct page *page = pvec.pages[i];
> + pgoff_t page_index = page->index;
> + struct zone *pagezone = page_zone(page);
> +
> + if (page_index > next)
> + next = page_index;
> + next++;
> +
> + if (TestSetPageLocked(page)) {
> + /*
> + * OK, let's do it the hard way...
> + */
> + if (zone)
> + spin_unlock_irq(&zone->lru_lock);
> + zone = NULL;
> + lock_page(page);
> + }
> +
> + if (pagezone != zone) {
> + if (zone)
> + spin_unlock_irq(&zone->lru_lock);
> + zone = pagezone;
> + spin_lock_irq(&zone->lru_lock);
> + }
> +
> + if (PageLRU(page) && PageNoreclaim(page))
> + check_move_noreclaim_page(page, zone);
> +
> + unlock_page(page);
> +
> + }
> + if (zone)
> + spin_unlock_irq(&zone->lru_lock);
> + pagevec_release(&pvec);
> + }
> +
> +}

This function can spend fantastically large amounts of time under
spin_lock_irq().

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/