RE: Cleancache and shared filesystems

From: Dan Magenheimer
Date: Tue May 31 2011 - 11:13:52 EST


> From: Steven Whitehouse [mailto:swhiteho@xxxxxxxxxx]
> Sent: Tuesday, May 31, 2011 2:58 AM
> To: Joel Becker
> Cc: Dan Magenheimer; linux-kernel@xxxxxxxxxxxxxxx; Sunil Mushran
> Subject: Re: Cleancache and shared filesystems
>
> Hi,
>
> On Fri, 2011-05-27 at 16:33 -0700, Joel Becker wrote:
> > On Fri, May 27, 2011 at 05:19:39PM +0100, Steven Whitehouse wrote:
> > > + if (ls->ls_ops == &gfs2_dlm_ops) {
> > > + if (gfs2_uuid_valid(sb->s_uuid))
> > > + cleancache_init_shared_fs(sb->s_uuid, sb);
> > > + } else {
> > > + cleancache_init_fs(sb);
> > > + }
> >
> > Hey Dan,
> > Steven makes a good point here. ocfs2 could also take advantage
> > of local filesystem behavior when running in local mode.
> >
> > Joel
> >
>
> There is a further issue as well - cleancache will only work when all
> nodes can see the same shared cache, so we will need a mount option to
> disable cleancache in the case we have (for example) a cluster of
> virtual machines split over multiple physical hosts.
>
> In fact, I think from the principle of least surprise this had better
> default to off and be enabled explicitly. Otherwise I can see that
> people will shoot themselves in the foot which will be very easy since
> there is no automatic way that I can see to verify that all nodes are
> looking at the same cache,

Though it's been nearly two years now since I thought
through this, I remember being concerned about that issue
too. But, for ocfs2 at least, cleancache hooks are embedded
in all the right places in VFS that the ocfs2 code that
cross-invalidates stale page cache pages on different
nodes also ensures coherence of cleancache and it
all just worked, whether VMs are split across hosts or not.

This may or may not be true for GFS2... for example, btrfs
required one cleancache hook outside of VFS to function
correctly.

Again, I am pretty ignorant about shared filesystems
so please correct me if I am missing anything important.

Also, I checked and Xen tmem (the only current user of
cleancache for which cluster-sharing makes sense) uses
128-bit -1 as its internal "don't share" indicator.
So you are correct that multiple non-shared VMs using
uuid==0 could potentially cause data corruption
if they share a physical machine and your code snippet
above is needed (assuming gfs2_uuid_valid returns
false for uuid==0?).


Dan
---
Thanks... for the memory!
I really could use more / my throughput's on the floor
The balloon is flat / my swap disk's fat / I've OOM's in store
Overcommitted so much
(with apologies to Bob Hope)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/