Re: [PATCH 24/28] xfs: reclaim inodes from the LRU

From: Dave Chinner
Date: Thu Nov 14 2019 - 16:52:05 EST


On Wed, Nov 06, 2019 at 12:21:04PM -0500, Brian Foster wrote:
> On Fri, Nov 01, 2019 at 10:46:14AM +1100, Dave Chinner wrote:
> > - struct xfs_inode *ip,
> > - int flags,
> > - xfs_lsn_t *lsn)
> > +enum lru_status
> > +xfs_inode_reclaim_isolate(
> > + struct list_head *item,
> > + struct list_lru_one *lru,
> > + spinlock_t *lru_lock,
>
> Did we ever establish whether we should cycle the lru_lock during long
> running scans?

I'm still evaluating this.

In theory, because it's non-blocking, the lock hold time isn't huge,
but OTOH I think the hold time is causing lock contention problems on
unlink workloads. I've found a bunch of perf/blocking problems in
the last few days, and each one of them I sort out puts more
pressure on the lru list lock on unlinks.

> > - /*
> > - * Do unlocked checks to see if the inode already is being flushed or in
> > - * reclaim to avoid lock traffic. If the inode is not clean, return the
> > - * position in the AIL for the caller to push to.
> > - */
> > - if (!xfs_inode_clean(ip)) {
> > - *lsn = ip->i_itemp->ili_item.li_lsn;
> > - return false;
> > + if (!__xfs_iflock_nowait(ip)) {
> > + lsn = ip->i_itemp->ili_item.li_lsn;
>
> This looks like a potential crash vector if we ever got here with a
> clean inode.

I'm not sure we can ever fail a flush lock attempt on a clean inode.
But I'll rework the lsn grabbing, I think.

> > + ra->dirty_skipped++;
> > + goto out_unlock_inode;
> > }
> >
> > - if (__xfs_iflags_test(ip, XFS_IFLOCK | XFS_IRECLAIM))
> > - return false;
> > + if (XFS_FORCED_SHUTDOWN(ip->i_mount))
> > + goto reclaim;
> >
> > /*
> > - * The radix tree lock here protects a thread in xfs_iget from racing
> > - * with us starting reclaim on the inode. Once we have the
> > - * XFS_IRECLAIM flag set it will not touch us.
> > - *
> > - * Due to RCU lookup, we may find inodes that have been freed and only
> > - * have XFS_IRECLAIM set. Indeed, we may see reallocated inodes that
> > - * aren't candidates for reclaim at all, so we must check the
> > - * XFS_IRECLAIMABLE is set first before proceeding to reclaim.
> > + * Now the inode is locked, we can actually determine if it is dirty
> > + * without racing with anything.
> > */
> > - spin_lock(&ip->i_flags_lock);
> > - if (!__xfs_iflags_test(ip, XFS_IRECLAIMABLE) ||
> > - __xfs_iflags_test(ip, XFS_IRECLAIM)) {
> > - /* not a reclaim candidate. */
> > - spin_unlock(&ip->i_flags_lock);
> > - return false;
> > + ret = LRU_ROTATE;
> > + if (xfs_ipincount(ip)) {
> > + ra->dirty_skipped++;
>
> Hmm.. didn't we have an LSN check here?

Yes, but if the inode was not in the AIL, it would crash, so I
removed it :P

> Altogether, I think the logic in this function would be a lot more
> simple if we had something like the following:
>
> ...
> /* ret == LRU_SKIP */
> if (!xfs_inode_clean(ip)) {
> ret = LRU_ROTATE;
> lsn = ip->i_itemp->ili_item.li_lsn;
> ra->dirty_skipped++;
> }
> if (lsn && XFS_LSN_CMP(lsn, ra->lowest_lsn) < 0)
> ra->lowest_lsn = lsn;
> return ret;
>
> ... as the non-reclaim exit path.

Yeah, that was what I was thinking when you pointed out the
iflock_nowait issue above. I'll end up with something like this, I
think....

> > void
> > xfs_reclaim_all_inodes(
> > struct xfs_mount *mp)
> > {
> ...
> > + while (list_lru_count(&mp->m_inode_lru)) {
>
> It seems unnecessary to call this twice per-iter:
>
> while ((to_free = list_lru_count(&mp->m_inode_lru))) {
> ...
> }

*nod*.

Cheers,

Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx