Re: [PATCHv6 00/22] Transparent huge page cache: phase 1, everythingbut mmap()

From: Kirill A. Shutemov
Date: Mon Oct 14 2013 - 09:56:53 EST


Dave Chinner wrote:
> On Wed, Sep 25, 2013 at 12:51:04PM +0300, Kirill A. Shutemov wrote:
> > Andrew Morton wrote:
> > > On Mon, 23 Sep 2013 15:05:28 +0300 "Kirill A. Shutemov" <kirill.shutemov@xxxxxxxxxxxxxxx> wrote:
> > >
> > > > It brings thp support for ramfs, but without mmap() -- it will be posted
> > > > separately.
> > >
> > > We were never going to do this :(
> > >
> > > Has anyone reviewed these patches much yet?
> >
> > Dave did very good review. Few other people looked to separate patches.
> > See Reviewed-by/Acked-by tags in patches.
> >
> > It looks like most mm experts are busy with numa balancing nowadays, so
> > it's hard to get more review.
>
> Nobody has reviewed it from the filesystem side, though.
>
> The changes that require special code paths for huge pages in the
> write_begin/write_end paths are nasty. You're adding conditional
> code that depends on the page size and then having to add checks to
> ensure that large page operations don't step over small page
> boundaries and other such corner cases. It's an extremely fragile
> design, IMO.
>
> In general, I don't like all the if (thp) {} else {}; code that this
> series introduces - they are code paths that simply won't get tested
> with any sort of regularity and make the code more complex for those
> that aren't using THP to understand and debug...

Okay, I'll try to get rid of special cases where it's possible.

> Then there is a new per-inode lock that is used in
> generic_perform_write() which is held across page faults and calls
> to filesystem block mapping callbacks. This inserts into the middle
> of an existing locking chain that needs to be strictly ordered, and
> as such will lead to the same type of lock inversion problems that
> the mmap_sem had. We do not want to introduce a new lock that has
> this same problem just as we are getting rid of that long standing
> nastiness from the page fault path...

I don't see how we can protect against splitting with existing locks,
but I'll try find a way.

> I also note that you didn't convert invalidate_inode_pages2_range()
> to support huge pages which is needed by real filesystems that
> support direct IO. There are other truncate/invalidate interfaces
> that you didn't convert, either, and some of them will present you
> with interesting locking challenges as a result of adding that new
> lock...

Thanks. I'll take a look on these code paths.

> > The patchset was mostly ignored for few rounds and Dave suggested to split
> > to have less scary patch number.
>
> It's still being ignored by filesystem people because you haven't
> actually tried to implement support into a real filesystem.....

If it will support a real filesystem, wouldn't it be ignored due
patch count? ;)

> > > > Please review and consider applying.
> > >
> > > It appears rather too immature at this stage.
> >
> > More review is always welcome and I'm committed to address issues.
>
> IMO, supporting a real block based filesystem like ext4 or XFS and
> demonstrating that everything works is necessary before we go any
> further...

Will see what numbers I can bring in next iterations.

Thanks for your feedback. And sorry for late answer.

--
Kirill A. Shutemov
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/