# Re: [PATCH 01/23] radix-tree: implement preload for multiplecontiguous elements

**From: **Jan Kara

**Date: ** Mon Aug 05 2013 - 07:17:50 EST

On Sun 04-08-13 05:17:03, Kirill A. Shutemov wrote:

>* From: "Kirill A. Shutemov" <kirill.shutemov@xxxxxxxxxxxxxxx>*

>* *

>* The radix tree is variable-height, so an insert operation not only has*

>* to build the branch to its corresponding item, it also has to build the*

>* branch to existing items if the size has to be increased (by*

>* radix_tree_extend).*

>* *

>* The worst case is a zero height tree with just a single item at index 0,*

>* and then inserting an item at index ULONG_MAX. This requires 2 new branches*

>* of RADIX_TREE_MAX_PATH size to be created, with only the root node shared.*

>* *

>* Radix tree is usually protected by spin lock. It means we want to*

>* pre-allocate required memory before taking the lock.*

>* *

>* Currently radix_tree_preload() only guarantees enough nodes to insert*

>* one element. It's a hard limit. For transparent huge page cache we want*

>* to insert HPAGE_PMD_NR (512 on x86-64) entries to address_space at once.*

>* *

>* This patch introduces radix_tree_preload_count(). It allows to*

>* preallocate nodes enough to insert a number of *contiguous* elements.*

>* The feature costs about 5KiB per-CPU, details below.*

>* *

>* Worst case for adding N contiguous items is adding entries at indexes*

>* (ULONG_MAX - N) to ULONG_MAX. It requires nodes to insert single worst-case*

>* item plus extra nodes if you cross the boundary from one node to the next.*

>* *

>* Preload uses per-CPU array to store nodes. The total cost of preload is*

>* "array size" * sizeof(void*) * NR_CPUS. We want to increase array size*

>* to be able to handle 512 entries at once.*

>* *

>* Size of array depends on system bitness and on RADIX_TREE_MAP_SHIFT.*

>* *

>* We have three possible RADIX_TREE_MAP_SHIFT:*

>* *

>* #ifdef __KERNEL__*

>* #define RADIX_TREE_MAP_SHIFT (CONFIG_BASE_SMALL ? 4 : 6)*

>* #else*

>* #define RADIX_TREE_MAP_SHIFT 3 /* For more stressful testing */*

>* #endif*

>* *

>* On 64-bit system:*

>* For RADIX_TREE_MAP_SHIFT=3, old array size is 43, new is 107.*

>* For RADIX_TREE_MAP_SHIFT=4, old array size is 31, new is 63.*

>* For RADIX_TREE_MAP_SHIFT=6, old array size is 21, new is 30.*

>* *

>* On 32-bit system:*

>* For RADIX_TREE_MAP_SHIFT=3, old array size is 21, new is 84.*

>* For RADIX_TREE_MAP_SHIFT=4, old array size is 15, new is 46.*

>* For RADIX_TREE_MAP_SHIFT=6, old array size is 11, new is 19.*

>* *

>* On most machines we will have RADIX_TREE_MAP_SHIFT=6. In this case,*

>* on 64-bit system the per-CPU feature overhead is*

>* for preload array:*

>* (30 - 21) * sizeof(void*) = 72 bytes*

>* plus, if the preload array is full*

>* (30 - 21) * sizeof(struct radix_tree_node) = 9 * 560 = 5040 bytes*

>* total: 5112 bytes*

>* *

>* on 32-bit system the per-CPU feature overhead is*

>* for preload array:*

>* (19 - 11) * sizeof(void*) = 32 bytes*

>* plus, if the preload array is full*

>* (19 - 11) * sizeof(struct radix_tree_node) = 8 * 296 = 2368 bytes*

>* total: 2400 bytes*

>* *

>* Since only THP uses batched preload at the moment, we disable (set max*

>* preload to 1) it if !CONFIG_TRANSPARENT_HUGEPAGE_PAGECACHE. This can be*

>* changed in the future.*

>* *

>* Signed-off-by: Matthew Wilcox <willy@xxxxxxxxxxxxxxx>*

>* Signed-off-by: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx>*

>* Acked-by: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>*

>* ---*

>* include/linux/radix-tree.h | 11 +++++++++++*

>* lib/radix-tree.c | 41 ++++++++++++++++++++++++++++++++---------*

>* 2 files changed, 43 insertions(+), 9 deletions(-)*

...

>* diff --git a/lib/radix-tree.c b/lib/radix-tree.c*

>* index 7811ed3..99ab73c 100644*

>* --- a/lib/radix-tree.c*

>* +++ b/lib/radix-tree.c*

>* @@ -82,16 +82,24 @@ static struct kmem_cache *radix_tree_node_cachep;*

>* * The worst case is a zero height tree with just a single item at index 0,*

>* * and then inserting an item at index ULONG_MAX. This requires 2 new branches*

>* * of RADIX_TREE_MAX_PATH size to be created, with only the root node shared.*

>* + **

>* + * Worst case for adding N contiguous items is adding entries at indexes*

>* + * (ULONG_MAX - N) to ULONG_MAX. It requires nodes to insert single worst-case*

>* + * item plus extra nodes if you cross the boundary from one node to the next.*

>* + **

>* * Hence:*

>* */*

>* -#define RADIX_TREE_PRELOAD_SIZE (RADIX_TREE_MAX_PATH * 2 - 1)*

>* +#define RADIX_TREE_PRELOAD_MIN (RADIX_TREE_MAX_PATH * 2 - 1)*

>* +#define RADIX_TREE_PRELOAD_MAX \*

>* + (RADIX_TREE_PRELOAD_MIN + \*

>* + DIV_ROUND_UP(RADIX_TREE_PRELOAD_NR - 1, RADIX_TREE_MAP_SIZE))*

Umm, is this really correct? I see two problems:

1) You may need internal tree nodes at various levels but you seem to

account only for the level 1.

2) The rounding doesn't seem right because RADIX_TREE_MAP_SIZE+2 nodes may

require 3 nodes at level 1 if the indexes are like:

i_0 | i_1 .. i_{RADIX_TREE_MAP_SIZE} | i_{RADIX_TREE_MAP_SIZE+1}

^ ^

node boundary node boundary

Otherwise the patch looks good.

Honza

--

Jan Kara <jack@xxxxxxx>

SUSE Labs, CR

--

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in

the body of a message to majordomo@xxxxxxxxxxxxxxx

More majordomo info at http://vger.kernel.org/majordomo-info.html

Please read the FAQ at http://www.tux.org/lkml/