Re: [PATCHv6 00/22] Transparent huge page cache: phase 1, everythingbut mmap()

From: Dave Chinner
Date: Wed Sep 25 2013 - 19:29:33 EST


On Wed, Sep 25, 2013 at 12:51:04PM +0300, Kirill A. Shutemov wrote:
> Andrew Morton wrote:
> > On Mon, 23 Sep 2013 15:05:28 +0300 "Kirill A. Shutemov" <kirill.shutemov@xxxxxxxxxxxxxxx> wrote:
> >
> > > It brings thp support for ramfs, but without mmap() -- it will be posted
> > > separately.
> >
> > We were never going to do this :(
> >
> > Has anyone reviewed these patches much yet?
>
> Dave did very good review. Few other people looked to separate patches.
> See Reviewed-by/Acked-by tags in patches.
>
> It looks like most mm experts are busy with numa balancing nowadays, so
> it's hard to get more review.

Nobody has reviewed it from the filesystem side, though.

The changes that require special code paths for huge pages in the
write_begin/write_end paths are nasty. You're adding conditional
code that depends on the page size and then having to add checks to
ensure that large page operations don't step over small page
boundaries and other such corner cases. It's an extremely fragile
design, IMO.

In general, I don't like all the if (thp) {} else {}; code that this
series introduces - they are code paths that simply won't get tested
with any sort of regularity and make the code more complex for those
that aren't using THP to understand and debug...

Then there is a new per-inode lock that is used in
generic_perform_write() which is held across page faults and calls
to filesystem block mapping callbacks. This inserts into the middle
of an existing locking chain that needs to be strictly ordered, and
as such will lead to the same type of lock inversion problems that
the mmap_sem had. We do not want to introduce a new lock that has
this same problem just as we are getting rid of that long standing
nastiness from the page fault path...

I also note that you didn't convert invalidate_inode_pages2_range()
to support huge pages which is needed by real filesystems that
support direct IO. There are other truncate/invalidate interfaces
that you didn't convert, either, and some of them will present you
with interesting locking challenges as a result of adding that new
lock...

> The patchset was mostly ignored for few rounds and Dave suggested to split
> to have less scary patch number.

It's still being ignored by filesystem people because you haven't
actually tried to implement support into a real filesystem.....

> > > Please review and consider applying.
> >
> > It appears rather too immature at this stage.
>
> More review is always welcome and I'm committed to address issues.

IMO, supporting a real block based filesystem like ext4 or XFS and
demonstrating that everything works is necessary before we go any
further...

Cheers,

Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/