Re: [PATCH] maple_tree: Remove GFP_ZERO from kmem_cache_alloc() and kmem_cache_alloc_bulk()

From: David Hildenbrand
Date: Tue Jan 24 2023 - 08:20:00 EST


On 06.01.23 19:36, Liam Howlett wrote:
* Mike Rapoport <rppt@xxxxxxxxxx> [230106 02:28]:
On Thu, Jan 05, 2023 at 04:05:34PM +0000, Liam Howlett wrote:
Preallocations are common in the VMA code to avoid allocating under
certain locking conditions. The preallocations must also cover the
worst-case scenario. Removing the GFP_ZERO flag from the
kmem_cache_alloc() (and bulk variant) calls will reduce the amount of
time spent zeroing memory that may not be used. Only zero out the
necessary area to keep track of the allocations in the maple state.
Zero the entire node prior to using it in the tree.

This required internal changes to node counting on allocation, so the
test code is also updated.

This restores some micro-benchmark performance:
up to +9% in mmtests mmap1 by my testing
+10% to +20% in mmap, mmapaddr, mmapmany tests reported by Red Hat

Link: https://bugzilla.redhat.com/show_bug.cgi?id=2149636
Reported-by: Jirka Hladky <jhladky@xxxxxxxxxx>
Suggested-by: Matthew Wilcox (Oracle) <willy@xxxxxxxxxxxxx>
Signed-off-by: Liam Howlett <Liam.Howlett@xxxxxxxxxx>
---
lib/maple_tree.c | 80 +++++++++++++++++---------------
tools/testing/radix-tree/maple.c | 18 +++----
2 files changed, 52 insertions(+), 46 deletions(-)

diff --git a/lib/maple_tree.c b/lib/maple_tree.c
index 26e2045d3cda..82a8121fe49b 100644
--- a/lib/maple_tree.c
+++ b/lib/maple_tree.c
@@ -149,13 +149,12 @@ struct maple_subtree_state {
/* Functions */
static inline struct maple_node *mt_alloc_one(gfp_t gfp)
{
- return kmem_cache_alloc(maple_node_cache, gfp | __GFP_ZERO);
+ return kmem_cache_alloc(maple_node_cache, gfp);
}
static inline int mt_alloc_bulk(gfp_t gfp, size_t size, void **nodes)
{
- return kmem_cache_alloc_bulk(maple_node_cache, gfp | __GFP_ZERO, size,
- nodes);
+ return kmem_cache_alloc_bulk(maple_node_cache, gfp, size, nodes);
}
static inline void mt_free_bulk(size_t size, void __rcu **nodes)
@@ -1127,9 +1126,10 @@ static inline struct maple_node *mas_pop_node(struct ma_state *mas)
{
struct maple_alloc *ret, *node = mas->alloc;
unsigned long total = mas_allocated(mas);
+ unsigned int req = mas_alloc_req(mas);
/* nothing or a request pending. */
- if (unlikely(!total))
+ if (WARN_ON(!total))

Hmm, isn't WARN_ON() here too much?

I don't think so. If we get to the point of asking for a node while we
don't have any to give, then it's too late. It means we (I) have
calculated the necessary nodes incorrectly and we won't have enough
memory to fit things into the tree. It should never happen.

Either way, the suggestion is to use WARN_ON_ONCE() instead of WARN_ON().

... now documented in Documentation/process/coding-style.rst

--
Thanks,

David / dhildenb