Re: [PATCHv5 4/8] zswap: add to mm/

From: Cody P Schafer
Date: Mon Feb 18 2013 - 14:50:43 EST

On 02/18/2013 11:24 AM, Seth Jennings wrote:
On 02/15/2013 10:04 PM, Ric Mason wrote:
On 02/14/2013 02:38 AM, Seth Jennings wrote:
+/* invalidates all pages for the given swap type */
+static void zswap_frontswap_invalidate_area(unsigned type)
+ struct zswap_tree *tree = zswap_trees[type];
+ struct rb_node *node, *next;
+ struct zswap_entry *entry;
+ if (!tree)
+ return;
+ /* walk the tree and free everything */
+ spin_lock(&tree->lock);
+ node = rb_first(&tree->rbroot);
+ while (node) {
+ entry = rb_entry(node, struct zswap_entry, rbnode);
+ zs_free(tree->pool, entry->handle);
+ next = rb_next(node);
+ zswap_entry_cache_free(entry);
+ node = next;
+ }
+ tree->rbroot = RB_ROOT;

Why don't need rb_erase for every nodes?

We are freeing the entire tree here. try_to_unuse() in the swapoff
syscall should have already emptied the tree, but this is here for

rb_erase() will do things like rebalancing the tree; something that
just wastes time since we are in the process of freeing the whole
tree. We are holding the tree lock here so we are sure that no one
else is accessing the tree while it is in this transient broken state.

If we have a sub-tree like:
/ \

B == rb_next(tree)
A == rb_next(B)
C == rb_next(A)

The current code free's A (via zswap_entry_cache_free()) prior to examining C, and thus rb_next(C) results in a use after free of A.

You can solve this by doing a post-order traversal of the tree, either

a) in the destructive manner used in a number of filesystems, see fs/ubifs/orphan.c ubifs_add_orphan(), for example.

b) or by doing something similar to this commit: , which I've been using for some yet-to-be-merged code.

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at