Re: [PATCH] rd: Use a private inode for backing storage

From: Nick Piggin
Date: Sun Oct 21 2007 - 05:44:28 EST


On Sunday 21 October 2007 16:48, Eric W. Biederman wrote:
> Nick Piggin <nickpiggin@xxxxxxxxxxxx> writes:
> > Yes it does. It is exactly breaking the coherency between block
> > device and filesystem metadata coherency that Andrew cared about.
> > Whether or not that matters, that is a much bigger conceptual
> > change than simply using slightly more (reclaimable) memory in
> > some situations that my patch does.
> >
> > If you want to try convincing people to break that coherency,
> > fine, but it has to be done consistently and everywhere rather than
> > for a special case in rd.c.
>
> Nick. Reread the patch. The only thing your arguments have
> established for me is that this patch is not obviously correct. Which
> makes it ineligible for a back port.

OK, I missed that you set the new inode's aops to the ramdisk_aops
rather than the bd_inode. Which doesn't make a lot of sense because
you just have a lot of useless aops there now.


> Frankly I suspect the whole
> issue is to subtle and rare to make any backport make any sense. My
> apologies Christian.

It's a data corruption issue. I think it should be fixed.


> >> The only way we make it to that inode is through block
> >> device I/O so it lives at exactly the same level in the hierarchy as
> >> a real block device.
> >
> > No, it doesn't. A real block device driver does have its own
> > buffer cache as it's backing store. It doesn't know about
> > readpage or writepage or set_page_dirty or buffers or pagecache.
>
> Well those pages are only accessed through rd_blkdev_pagecache_IO
> and rd_ioctl.

Wrong. It will be via the LRU, will get ->writepage() called,
block_invalidate_page, etc. And I guess also via sb->s_inodes, where
drop_pagecache_sb might do stuff to it (although it probably escapes
harm). But you're right that it isn't the obviously correct fix for
the problem.


> >> My patch is the considered rewrite boiled down
> >> to it's essentials and made a trivial patch.
> >
> > What's the considered rewrite here? The rewrite I posted is the
> > only one so far that's come up that I would consider [worthy],
> > while these patches are just more of the same wrongness.
>
> Well it looks like you were blind when you read the patch.

If you think it is a nice way to go, then I think you are
blind ;)


> Because the semantics between the two are almost identical,
> except I managed to implement BLKFLSBUF in a backwards compatible
> way by flushing both the buffer cache and my private cache. You
> failed to flush the buffer cache in your implementation.

Obviously a simple typo that can be fixed by adding one line
of code.


> Yes. I use an inode 99% for it's mapping and the mapping 99% for it's
> radix_tree. But having truncate_inode_pages and grab_cache_page
> continue to work sure is convenient.

It's horrible. And using truncate_inode_pages / grab_cache_page and
new_inode is an incredible argument to save a few lines of code. You
obviously didn't realise your so called private pages would get
accessed via the LRU, for example. This is making a relatively
larger logical change than my patch, because now as well as having
a separate buffer cache and backing store, you are also making the
backing store pages visible to the VM.


> I certainly think it makes it a
> lot simpler to audit the code to change just one thing at a time (the
> backing store) then to rip out and replace everything and then try and
> prove that the two patches are equivalent.

I think it's a bad idea just to stir the shit. We should take the
simple fix for the problem, and then fix it properly.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/