Re: [PATCH 0/8] Support multi-order entries in the radix tree

From: Ross Zwisler
Date: Wed Feb 24 2016 - 15:24:34 EST


On Tue, Jan 19, 2016 at 09:25:25AM -0500, Matthew Wilcox wrote:
> From: Matthew Wilcox <willy@xxxxxxxxxxxxxxx>
>
> In order to support huge pages in the page cache, Kirill has proposed
> simply creating 512 entries. I think this runs into problems with
> fsync() tracking dirty bits in the radix tree. Ross inserts a special
> entry to represent the PMD at the index for the start of the PMD, but
> this requires probing the tree twice; once for the PTE and once for the PMD.
> When we add PUD entries, that will become three times.
>
> The approach in this patch set is to modify the radix tree to support
> multi-order entries. Pointers to internal radix tree nodes mostly do not
> have the 'indirect' bit set. I change that so they always have that bit
> set; then any pointer without the indirect bit set is a multi-order entry.
>
> If the order of the entry is a multiple of the fanout of the tree,
> then all is well. If not, it is necessary to insert alias nodes into
> the tree that point to the canonical entry. At this point, I have not
> added support for entries which are smaller than the last-level fanout of
> the tree (and I put a BUG_ON in to prevent that usage). Adding support
> would be a simple matter of one last pointer-chase when we get to the
> bottom of the tree, but I am not aware of any reason to add support for
> smaller multi-order entries at this point, so I haven't.
>
> Note that no actual users are modified at this point. I think it'd be
> mostly a matter of deleting code from the DAX fsync support at this point,
> but with that code in flux, I'm a little reluctant to add more churn
> to it. I'm also not entriely sure where Kirill is on the page-cache
> modifications; he seems to have his hands full fixing up the MM right now.
>
> Before diving into the important modifications, I add Andrew Morton's
> radix tree test harness to the tree in patches 1 & 2. It was absolutely
> invaluable in catching some of my bugs. Patches 3 & 4 are minor tweaks.
> Patches 5-7 are the interesting ones. Patch 8 we might want to leave
> out entirely or shift over to the test harness. I found it useful during
> debugging and others might too.
>
> Matthew Wilcox (8):
> radix-tree: Add an explicit include of bitops.h
> radix tree test harness
> radix-tree: Cleanups
> radix_tree: Convert some variables to unsigned types
> radix_tree: Tag all internal tree nodes as indirect pointers
> radix_tree: Loop based on shift count, not height
> radix_tree: Add support for multi-order entries
> radix_tree: Add radix_tree_dump

I like the idea of this approach - I'll work on integrating it into DAX *sync.

One quick note - some of the patches are prefixed with "radix-tree" and others
with "radix_tree".

Also, if we go through the trouble of including the radix tree test harness,
should we include a new test at the end of the series that tests out
multi-order radix tree entries?