Re: Re: [PATCH 1/2] rbd: RBD_DEV_FLAG_THICK rbd_dev_flags bit

From: Ilya Dryomov
Date: Fri Mar 23 2018 - 05:34:09 EST


On Fri, Mar 23, 2018 at 10:31 AM, Ilya Dryomov <idryomov@xxxxxxxxx> wrote:
> On Thu, Mar 22, 2018 at 12:57 PM, äääå / KAMEIïHITOSHI
> <hitoshi.kamei.xm@xxxxxxxxxxx> wrote:
>> Hi Yang,
>>
>>> I am not sure is this the best way for this case, what about adding an option in "rbd map -o thick rbd/test"?
>>
>> I will add such option to the rbd map command to manipulate image settings. So, the end-user
>> do not change the settings directly via sysfs file.
>>
>>> @@ -4011,6 +4012,15 @@ static void rbd_queue_workfn(struct work_struct *work)
>>> goto err;
>>> }
>>>
>>> + /* Ignore/skip discard requests for thick-provision image */
>>>
>>> Just ignore? or return -EOPNOTSUPP?
>>
>> Thanks, I think -EOPNOTSUPP is better because user programs cannot know
>> the result of requested operation when the kernel rbd driver ignores
>> discard request. The result of requested operation when the kernel rbd driver
>> ignores discard requests, which probably misleads the user programs.
>>
>>> In addition, we should not ignore the REQ_OP_WRITE_ZEROES.
>>
>> Relating to the above, the return code of REQ_OP_WRITE_ZEROS request
>> is also -EOPNOTSUPP instead of ignoring. I think the result of
>> -EOPNOTSUPP is also better for this request because the kernel
>> rbd driver can expect that user programs write zero data by itself.
>
> REQ_OP_WRITE_ZEROES should continue to work, we just need to make
> sure it never issues truncates or deletes and instead writes zeroes
> explicitly.
>
> I think we should be explicit about the fact that discard is not
> supported instead of accepting the discard request and failing it in
> rbd_queue_workfn(). Attached patch is what I have in mind.

Now with the patch.

Thanks,

Ilya
From 1db405366371cb12e5182815c06f6f491af4b63c Mon Sep 17 00:00:00 2001
From: Ilya Dryomov <idryomov@xxxxxxxxx>
Date: Fri, 23 Mar 2018 09:14:47 +0100
Subject: [PATCH] rbd: notrim map option

Add an option to turn off discard and write zeroes offload support to
avoid deprovisioning a fully provisioned image. When enabled, discard
requests will fail with -EOPNOTSUPP, write zeroes requests will fall
back to manually zeroing.

Signed-off-by: Ilya Dryomov <idryomov@xxxxxxxxx>
---
drivers/block/rbd.c | 19 ++++++++++++++-----
1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 1a90e5dd7470..dbe440a6531e 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -733,6 +733,7 @@ enum {
Opt_read_write,
Opt_lock_on_read,
Opt_exclusive,
+ Opt_notrim,
Opt_err
};

@@ -746,6 +747,7 @@ static match_table_t rbd_opts_tokens = {
{Opt_read_write, "rw"}, /* Alternate spelling */
{Opt_lock_on_read, "lock_on_read"},
{Opt_exclusive, "exclusive"},
+ {Opt_notrim, "notrim"},
{Opt_err, NULL}
};

@@ -754,12 +756,14 @@ struct rbd_options {
bool read_only;
bool lock_on_read;
bool exclusive;
+ bool trim;
};

#define RBD_QUEUE_DEPTH_DEFAULT BLKDEV_MAX_RQ
#define RBD_READ_ONLY_DEFAULT false
#define RBD_LOCK_ON_READ_DEFAULT false
#define RBD_EXCLUSIVE_DEFAULT false
+#define RBD_TRIM_DEFAULT true

static int parse_rbd_opts_token(char *c, void *private)
{
@@ -801,6 +805,9 @@ static int parse_rbd_opts_token(char *c, void *private)
case Opt_exclusive:
rbd_opts->exclusive = true;
break;
+ case Opt_notrim:
+ rbd_opts->trim = false;
+ break;
default:
/* libceph prints "bad option" msg */
return -EINVAL;
@@ -3940,11 +3947,12 @@ static int rbd_init_disk(struct rbd_device *rbd_dev)
blk_queue_io_min(q, objset_bytes);
blk_queue_io_opt(q, objset_bytes);

- /* enable the discard support */
- queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, q);
- q->limits.discard_granularity = objset_bytes;
- blk_queue_max_discard_sectors(q, objset_bytes >> SECTOR_SHIFT);
- blk_queue_max_write_zeroes_sectors(q, objset_bytes >> SECTOR_SHIFT);
+ if (rbd_dev->opts->trim) {
+ queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, q);
+ q->limits.discard_granularity = objset_bytes;
+ blk_queue_max_discard_sectors(q, objset_bytes >> SECTOR_SHIFT);
+ blk_queue_max_write_zeroes_sectors(q, objset_bytes >> SECTOR_SHIFT);
+ }

if (!ceph_test_opt(rbd_dev->rbd_client->client, NOCRC))
q->backing_dev_info->capabilities |= BDI_CAP_STABLE_WRITES;
@@ -5170,6 +5178,7 @@ static int rbd_add_parse_args(const char *buf,
rbd_opts->queue_depth = RBD_QUEUE_DEPTH_DEFAULT;
rbd_opts->lock_on_read = RBD_LOCK_ON_READ_DEFAULT;
rbd_opts->exclusive = RBD_EXCLUSIVE_DEFAULT;
+ rbd_opts->trim = RBD_TRIM_DEFAULT;

copts = ceph_parse_options(options, mon_addrs,
mon_addrs + mon_addrs_size - 1,
--
2.4.3