Re: [PATCH] Possible data loss on ext[34], reiserfs with externaljournal

From: tytso
Date: Tue Dec 15 2009 - 11:45:21 EST


On Tue, Dec 15, 2009 at 01:19:57AM -0500, Oleg Drokin wrote:
> > + /*
> > + * If the journal is not located on the file system device,
> > + * then we must flush the file system device before we issue
> > + * the commit record
> > + */
> > + if (commit_transaction->t_flushed_data_blocks &&
> > + (journal->j_fs_dev != journal->j_dev) &&
> > + (journal->j_flags & JBD2_BARRIER))
> > + blkdev_issue_flush(journal->j_fs_dev, NULL);
> > +
>
> I am afraid this is not enough. This code is called after journal
> was flushed for async commit case, so it leaves a race window where
> journal transaction is already on disk and complete, but the data is
> still in cache somewhere.

No, that's actually fine. In the ASYNC_COMMIT case, the commit won't
be valid until the checksum is correct, and we won't have written any
descriptor blocks yet at this point. So there is no race because
during that window, the commit is written but we won't write any
descriptor blocks until after the barrier returns.

> Also the callsite has this comment which is misleading, I think:
> /*
> * This is the right place to wait for data buffers both for ASYNC
> * and !ASYNC commit. If commit is ASYNC, we need to wait only after
> * the commit block went to disk (which happens above). If commit is
> * SYNC, we need to wait for data buffers before we start writing
> * commit block, which happens below in such setting.
> */

Yeah, that comment is confusing and not entirely accurate. I thought
about cleaning it up, and then decided to do that in a separate patch.

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/