Re: [PATCH] mm: introduce kv[mz]alloc helpers

From: Vlastimil Babka
Date: Tue Jan 03 2017 - 05:23:15 EST


On 01/02/2017 02:37 PM, Michal Hocko wrote:
From: Michal Hocko <mhocko@xxxxxxxx>

Using kmalloc with the vmalloc fallback for larger allocations is a
common pattern in the kernel code. Yet we do not have any common helper
for that and so users have invented their own helpers. Some of them are
really creative when doing so. Let's just add kv[mz]alloc and make sure
it is implemented properly. This implementation makes sure to not make
a large memory pressure for > PAGE_SZE requests (__GFP_NORETRY) and also
to not warn about allocation failures. This also rules out the OOM
killer as the vmalloc is a more approapriate fallback than a disruptive
user visible action.

This patch also changes some existing users and removes helpers which
are specific for them. In some cases this is not possible (e.g.
ext4_kvmalloc, libcfs_kvzalloc, __aa_kvmalloc) because those seems to be
broken and require GFP_NO{FS,IO} context which is not vmalloc compatible
in general (note that the page table allocation is GFP_KERNEL). Those
need to be fixed separately.

apparmor has already claimed kv[mz]alloc so remove those and use
__aa_kvmalloc instead to prevent from the naming clashes.

Changes since v1
- define __vmalloc_node_flags for CONFIG_MMU=n

Cc: Anatoly Stepanov <astepanov@xxxxxxxxxxxxxx>
Cc: Paolo Bonzini <pbonzini@xxxxxxxxxx>
Cc: Mike Snitzer <snitzer@xxxxxxxxxx>
Cc: "Michael S. Tsirkin" <mst@xxxxxxxxxx>
Cc: "Theodore Ts'o" <tytso@xxxxxxx>
Reviewed-by: Andreas Dilger <adilger@xxxxxxxxx> # ext4 part
Signed-off-by: Michal Hocko <mhocko@xxxxxxxx>

Acked-by: Vlastimil Babka <vbabka@xxxxxxx>
(but with a small fix and suggestion below)

--- a/mm/util.c
+++ b/mm/util.c
@@ -346,6 +346,46 @@ unsigned long vm_mmap(struct file *file, unsigned long addr,
}
EXPORT_SYMBOL(vm_mmap);

+/**
+ * kvmalloc_node - allocate contiguous memory from SLAB with vmalloc fallback
+ * @size: size of the request.
+ * @flags: gfp mask for the allocation - must be compatible with GFP_KERNEL.
+ * @node: numa node to allocate from
+ *
+ * Uses kmalloc to get the memory but if the allocation fails then falls back
+ * to the vmalloc allocator. Use kvfree for freeing the memory.
+ */
+void *kvmalloc_node(size_t size, gfp_t flags, int node)
+{
+ gfp_t kmalloc_flags = flags;
+ void *ret;
+
+ /*
+ * vmalloc uses GFP_KERNEL for some internal allocations (e.g page tables)
+ * so the given set of flags has to be compatible.
+ */
+ WARN_ON((flags & GFP_KERNEL) != GFP_KERNEL);

Wouldn't a _ONCE be sufficient? It's unlikely that multiple wrong call sites appear out of the blue, but we don't want to flood the log from a single frequently called site. No strong feelings though.

+
+ /*
+ * Make sure that larger requests are not too disruptive - no OOM
+ * killer and no allocation failure warnings as we have a fallback
+ */
+ if (size > PAGE_SIZE)
+ kmalloc_flags |= __GFP_NORETRY | __GFP_NOWARN;
+
+ ret = kmalloc_node(size, kmalloc_flags, node);
+
+ /*
+ * It doesn't really make sense to fallback to vmalloc for sub page
+ * requests
+ */
+ if (ret || size < PAGE_SIZE)

This should be size <= PAGE_SIZE.

+ return ret;
+
+ return __vmalloc_node_flags(size, node, flags);
+}
+EXPORT_SYMBOL(kvmalloc_node);
+
void kvfree(const void *addr)
{
if (is_vmalloc_addr(addr))