[PATCH RFC] lib: Make radix_tree_node_alloc() irq safe

From: Jan Kara
Date: Tue Jul 16 2013 - 13:06:46 EST


With users of radix_tree_preload() run from interrupt (CFQ is one such
possible user), the following race can happen:

radix_tree_preload()
...
radix_tree_insert()
radix_tree_node_alloc()
if (rtp->nr) {
ret = rtp->nodes[rtp->nr - 1];
<interrupt>
...
radix_tree_preload()
...
radix_tree_insert()
radix_tree_node_alloc()
if (rtp->nr) {
ret = rtp->nodes[rtp->nr - 1];

And we give out one radix tree node twice. That clearly results in radix
tree corruption with different results (usually OOPS) depending on which
two users of radix tree race.

Fix the problem by disabling interrupts when working with rtp variable.
In-interrupt user can still deplete our preloaded nodes but at least we
won't corrupt radix trees.

Signed-off-by: Jan Kara <jack@xxxxxxx>
---
lib/radix-tree.c | 19 ++++++++++++++++---
1 file changed, 16 insertions(+), 3 deletions(-)

There are some questions regarding this patch:
Do we really want to allow in-interrupt users of radix_tree_preload()? CFQ
could certainly do this in older kernels but that particular call site where I
saw the bug hit isn't there anymore so I'm not sure this can really happen with
recent kernels.

Also it is actually harmful to do preloading if you are in interrupt context
anyway. The disadvantage of disallowing radix_tree_preload() in interrupt is
that we would need to tweak radix_tree_node_alloc() to somehow recognize
whether the caller wants it to use preloaded nodes or not and that callers
would have to get it right (although maybe some magic in radix_tree_preload()
could handle that).

Opinions?

diff --git a/lib/radix-tree.c b/lib/radix-tree.c
index e796429..6f1045d 100644
--- a/lib/radix-tree.c
+++ b/lib/radix-tree.c
@@ -209,18 +209,26 @@ radix_tree_node_alloc(struct radix_tree_root *root)

if (!(gfp_mask & __GFP_WAIT)) {
struct radix_tree_preload *rtp;
+ unsigned long flags;

/*
* Provided the caller has preloaded here, we will always
* succeed in getting a node here (and never reach
- * kmem_cache_alloc)
+ * kmem_cache_alloc)... unless we race with interrupt also
+ * consuming preloaded nodes.
*/
rtp = &__get_cpu_var(radix_tree_preloads);
+ /*
+ * Disable interrupts to make sure radix_tree_node_alloc()
+ * called from interrupt cannot return the same node as we do.
+ */
+ local_irq_save(flags);
if (rtp->nr) {
ret = rtp->nodes[rtp->nr - 1];
rtp->nodes[rtp->nr - 1] = NULL;
rtp->nr--;
}
+ local_irq_restore(flags);
}
if (ret == NULL)
ret = kmem_cache_alloc(radix_tree_node_cachep, gfp_mask);
@@ -269,6 +277,7 @@ int radix_tree_preload(gfp_t gfp_mask)
struct radix_tree_preload *rtp;
struct radix_tree_node *node;
int ret = -ENOMEM;
+ unsigned long flags;

preempt_disable();
rtp = &__get_cpu_var(radix_tree_preloads);
@@ -278,11 +287,15 @@ int radix_tree_preload(gfp_t gfp_mask)
if (node == NULL)
goto out;
preempt_disable();
+ local_irq_save(flags);
rtp = &__get_cpu_var(radix_tree_preloads);
- if (rtp->nr < ARRAY_SIZE(rtp->nodes))
+ if (rtp->nr < ARRAY_SIZE(rtp->nodes)) {
rtp->nodes[rtp->nr++] = node;
- else
+ local_irq_restore(flags);
+ } else {
+ local_irq_restore(flags);
kmem_cache_free(radix_tree_node_cachep, node);
+ }
}
ret = 0;
out:
--
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/