[PATCH 3.16 207/366] dm thin: handle running out of data space vs concurrent discard

From: Ben Hutchings
Date: Sun Nov 11 2018 - 15:05:21 EST

Next message: Ben Hutchings: "[PATCH 3.16 041/366] powerpc: make feature-fixup tests fortify-safe"
Previous message: Ben Hutchings: "[PATCH 3.16 087/366] perf/core: Fix group scheduling with mixed hw and sw events"
In reply to: Ben Hutchings: "[PATCH 3.16 087/366] perf/core: Fix group scheduling with mixed hw and sw events"
Next in thread: Ben Hutchings: "[PATCH 3.16 041/366] powerpc: make feature-fixup tests fortify-safe"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

3.16.61-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Mike Snitzer <snitzer@xxxxxxxxxx>

commit a685557fbbc3122ed11e8ad3fa63a11ebc5de8c3 upstream.

Discards issued to a DM thin device can complete to userspace (via
fstrim) _before_ the metadata changes associated with the discards is
reflected in the thinp superblock (e.g. free blocks). As such, if a
user constructs a test that loops repeatedly over these steps, block
allocation can fail due to discards not having completed yet:
1) fill thin device via filesystem file
2) remove file
3) fstrim

=46rominitial report, here:
https://www.redhat.com/archives/dm-devel/2018-April/msg00022.html

"The root cause of this issue is that dm-thin will first remove
mapping and increase corresponding blocks' reference count to prevent
them from being reused before DISCARD bios get processed by the
underlying layers. However. increasing blocks' reference count could
also increase the nr_allocated_this_transaction in struct sm_disk
which makes smd->old_ll.nr_allocated +
smd->nr_allocated_this_transaction bigger than smd->old_ll.nr_blocks.
In this case, alloc_data_block() will never commit metadata to reset
the begin pointer of struct sm_disk, because sm_disk_get_nr_free()
always return an underflow value."

While there is room for improvement to the space-map accounting that
thinp is making use of: the reality is this test is inherently racey and
will result in the previous iteration's fstrim's discard(s) completing
vs concurrent block allocation, via dd, in the next iteration of the
loop.

No amount of space map accounting improvements will be able to allow
user's to use a block before a discard of that block has completed.

So the best we can really do is allow DM thinp to gracefully handle such
aggressive use of all the pool's data by degrading the pool into
out-of-data-space (OODS) mode. We _should_ get that behaviour already
(if space map accounting didn't falsely cause alloc_data_block() to
believe free space was available).. but short of that we handle the
current reality that dm_pool_alloc_data_block() can return -ENOSPC.

Reported-by: Dennis Yang <dennisyang@xxxxxxxx>
Signed-off-by: Mike Snitzer <snitzer@xxxxxxxxxx>
Signed-off-by: Ben Hutchings <ben@xxxxxxxxxxxxxxx>
---
drivers/md/dm-thin.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)

--- a/drivers/md/dm-thin.c
+++ b/drivers/md/dm-thin.c
@@ -938,6 +938,8 @@ static void schedule_zero(struct thin_c

static void set_pool_mode(struct pool *pool, enum pool_mode new_mode);

+static void requeue_bios(struct pool *pool);
+
static void check_for_space(struct pool *pool)
{
int r;
@@ -950,8 +952,10 @@ static void check_for_space(struct pool
if (r)
return;

- if (nr_free)
+ if (nr_free) {
set_pool_mode(pool, PM_WRITE);
+ requeue_bios(pool);
+ }
}

/*
@@ -1028,7 +1032,10 @@ static int alloc_data_block(struct thin_

r = dm_pool_alloc_data_block(pool->pmd, result);
if (r) {
- metadata_operation_failed(pool, "dm_pool_alloc_data_block", r);
+ if (r == -ENOSPC)
+ set_pool_mode(pool, PM_OUT_OF_DATA_SPACE);
+ else
+ metadata_operation_failed(pool, "dm_pool_alloc_data_block", r);
return r;
}

Next message: Ben Hutchings: "[PATCH 3.16 041/366] powerpc: make feature-fixup tests fortify-safe"
Previous message: Ben Hutchings: "[PATCH 3.16 087/366] perf/core: Fix group scheduling with mixed hw and sw events"
In reply to: Ben Hutchings: "[PATCH 3.16 087/366] perf/core: Fix group scheduling with mixed hw and sw events"
Next in thread: Ben Hutchings: "[PATCH 3.16 041/366] powerpc: make feature-fixup tests fortify-safe"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]