Re: [RFC PATCH 0/6] Do not call ->writepage[s] from direct reclaimand use a_ops->writepages() where possible

From: Andrea Arcangeli
Date: Tue Jun 15 2010 - 13:38:24 EST


On Tue, Jun 15, 2010 at 05:30:44PM +0100, Mel Gorman wrote:
> See this
>
> STATIC int
> xfs_vm_writepage(
> struct page *page,
> struct writeback_control *wbc)
> {
> int error;
> int need_trans;
> int delalloc, unmapped, unwritten;
> struct inode *inode = page->mapping->host;
>
> trace_xfs_writepage(inode, page, 0);
>
> /*
> * Refuse to write the page out if we are called from reclaim
> * context.
> *
> * This is primarily to avoid stack overflows when called from deep
> * used stacks in random callers for direct reclaim, but disabling
> * reclaim for kswap is a nice side-effect as kswapd causes rather
> * suboptimal I/O patters, too.
> *
> * This should really be done by the core VM, but until that happens
> * filesystems like XFS, btrfs and ext4 have to take care of this
> * by themselves.
> */
> if (current->flags & PF_MEMALLOC)
> goto out_fail;

so it's under xfs/linux-2.6... ;) I guess this dates back from the
xfs/irix xfs/freebsd days, no prob.

> Again, missing the code to do it and am missing data showing that not
> writing pages in direct reclaim is really a bad idea.

Your code is functionally fine, my point is it's not just writepage as
shown by the PF_MEMALLOC check in ext4.

> Other than the whole "lacking the code" thing and it's still not clear that
> writing from direct reclaim is absolutly necessary for VM stability considering
> it's been ignored today by at least two filesystems. I can add the throttling
> logic if it'd make you happied but I know it'd be at least two weeks
> before I could start from scratch on a
> stack-switch-based-solution and a PITA considering that I'm not convinced
> it's necessary :)

The reason things are working on I think is because of
wait_on_page_writeback. By the time lots of ram is full with dirty
pdflush and stuff will submit I/O, then VM will still wait on I/O to
complete. Waiting is eating no stack, submitting I/O does instead. So
that explains why everything works fine.

It'd be interesting to verify that things don't fall apart with
current xfs if you swapon ./file_on_xfs instead of /dev/something.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/