Re: [PATCH v2 4/4] gfs2: convert to errseq_t based writeback error reporting for fsync

From: Jeff Layton
Date: Fri Jul 28 2017 - 08:47:47 EST


On Fri, 2017-07-28 at 13:37 +0100, Steven Whitehouse wrote:
> Hi,
>
>
> On 27/07/17 13:47, Bob Peterson wrote:
> > ----- Original Message -----
> > > On Wed, 2017-07-26 at 12:21 -0700, Matthew Wilcox wrote:
> > > > On Wed, Jul 26, 2017 at 01:55:38PM -0400, Jeff Layton wrote:
> > > > > @@ -668,12 +668,14 @@ static int gfs2_fsync(struct file *file, loff_t
> > > > > start, loff_t end,
> > > > > if (ret)
> > > > > return ret;
> > > > > if (gfs2_is_jdata(ip))
> > > > > - filemap_write_and_wait(mapping);
> > > > > + ret = file_write_and_wait(file);
> > > > > + if (ret)
> > > > > + return ret;
> > > > > gfs2_ail_flush(ip->i_gl, 1);
> > > > > }
> > > >
> > > > Do we want to skip flushing the AIL if there was an error (possibly
> > > > previously encountered)? I'd think we'd want to flush the AIL then report
> > > > the error, like this:
> > > >
> > >
> > > I wondered about that. Note that earlier in the function, we also bail
> > > out without flushing the AIL if sync_inode_metadata fails, so I assumed
> > > that we'd want to do the same here.
> > >
> > > I could definitely be wrong and am fine with changing it if so.
> > > Discarding the error like we do today seems wrong though.
> > >
> > > Bob, thoughts?
> >
> > Hi Jeff, Matthew,
> >
> > I'm not sure there's a right or wrong answer here. I don't know what's
> > best from a "correctness" point of view.
> >
> > I guess I'm leaning toward Jeff's original solution where we don't
> > call gfs2_ail_flush() on error. The main purpose of ail_flush is to
> > go through buffer descriptors (bds) attached to the glock and generate
> > revokes for them in a new transaction. If there's an error condition,
> > trying to go through more hoops will probably just get us into more
> > trouble. If the error is -ENOMEM, we don't want to allocate new memory
> > for the new transaction. If the error is -EIO, we probably don't
> > want to encourage more writing either.
> >
> > So on the one hand, it might be good to get rid of the buffer descriptors
> > so we don't leak memory, but that's probably also done elsewhere.
> > I have not chased down what happens in that case, but the same thing
> > would happen in the existing -EIO case a few lines above.
> >
> > On the other hand, we probably don't want to start a new transaction
> > and start adding revokes to it, and such, due to the error.
> >
> > Perhaps Steve Whitehouse can weigh in?
> >
> > Regards,
> >
> > Bob Peterson
> > Red Hat File Systems
>
> Yes, we probably do want to skip the ail flush if there is an error. We
> don't know whether the error is permanent or transient at that stage. If
> a previous stage of the fsync has failed, then there may be nothing for
> the next stage to do anyway, so it is probably not a big deal either
> way. So long as the error is reported to the caller, then we should be ok,
>

Ok, cool. I'll plan to carry this patch for now as it depends on an
earlier one in the series. One more question though:

Is it correct in the gfs2_is_jdata case to ignore the range that was
passed in from the caller? ->fsync gets start and end arguments, but
this will always write back the whole range. Is that necessary in this
case?

--
Jeff Layton <jlayton@xxxxxxxxxx>