[19/99] dm snapshot: sort by chunk size to fix race

From: Greg KH
Date: Fri Nov 06 2009 - 17:46:27 EST

2.6.31-stable review patch. If anyone has any objections, please let us know.

From: Mikulas Patocka <mpatocka@xxxxxxxxxx>

commit 6d45d93ead319423099b82a4efd775bc0f159121 upstream.

Avoid a race causing corruption when snapshots of the same origin have
different chunk sizes by sorting the internal list of snapshots by chunk
size, largest first.

For example, let's have two snapshots with different chunk sizes. The
first snapshot (1) has small chunk size and the second snapshot (2) has
large chunk size. Let's have chunks A, B, C in these snapshots:
snapshot1: ====A==== ====B====
snapshot2: ==========C==========

(Chunk size is a power of 2. Chunks are aligned.)

A write to the origin at a position within A and C comes along. It
triggers reallocation of A, then reallocation of C and links them
together using A as the 'primary' exception.

Then another write to the origin comes along at a position within B and
C. It creates pending exception for B. C already has a reallocation in
progress and it already has a primary exception (A), so nothing is done
to it: B and C are not linked.

If the reallocation of B finishes before the reallocation of C, because
there is no link with the pending exception for C it does not know to
wait for it and, the second write is dispatched to the origin and causes
data corruption in the chunk C in snapshot2.

To avoid this situation, we maintain snapshots sorted in descending
order of chunk size. This leads to a guaranteed ordering on the links
between the pending exceptions and avoids the problem explained above -
both A and B now get linked to C.

Signed-off-by: Mikulas Patocka <mpatocka@xxxxxxxxxx>
Signed-off-by: Alasdair G Kergon <agk@xxxxxxxxxx>
Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxx>

drivers/md/dm-snap.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)

--- a/drivers/md/dm-snap.c
+++ b/drivers/md/dm-snap.c
@@ -296,6 +296,7 @@ static void __insert_origin(struct origi
static int register_snapshot(struct dm_snapshot *snap)
+ struct dm_snapshot *l;
struct origin *o, *new_o;
struct block_device *bdev = snap->origin->bdev;

@@ -319,7 +320,11 @@ static int register_snapshot(struct dm_s

- list_add_tail(&snap->list, &o->snapshots);
+ /* Sort the list according to chunk size, largest-first smallest-last */
+ list_for_each_entry(l, &o->snapshots, list)
+ if (l->store->chunk_size < snap->store->chunk_size)
+ break;
+ list_add_tail(&snap->list, &l->list);

return 0;

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/