Re: first bad commit: [5795eb443060148796beeba106e4366d7f1458a6] scsi: sd_zbc: emulate ZONE_APPEND commands

From: Damien Le Moal
Date: Sat Sep 12 2020 - 09:00:13 EST


On 2020/09/12 18:09, Johannes Thumshirn wrote:
> On 12/09/2020 04:31, Damien Le Moal wrote:
>> On 2020/09/12 8:07, Borislav Petkov wrote:
>>> On Sat, Sep 12, 2020 at 12:17:59AM +0200, Borislav Petkov wrote:
>>>> Enabling it, fixes the issue.
>>>
>>> Btw, I just hit the below warn with 5.8, while booting with the above
>>> config option enabled. Looks familiar and I didn't trigger it with
>>> 5.9-rc4+ so you guys either fixed it or something changed in-between:
>>>
>>> [ 5.124321] ata4.00: NCQ Send/Recv Log not supported
>>> [ 5.131484] ata4.00: configured for UDMA/133
>>> [ 5.135847] scsi 3:0:0:0: Direct-Access ATA ST8000AS0022-1WL SN01 PQ: 0 ANSI: 5
>>> [ 5.143972] sd 3:0:0:0: Attached scsi generic sg1 type 0
>>> [ 5.144033] sd 3:0:0:0: [sdb] Host-aware zoned block device
>>> [ 5.177105] sd 3:0:0:0: [sdb] 15628053168 512-byte logical blocks: (8.00 TB/7.28 TiB)
>>> [ 5.184880] sd 3:0:0:0: [sdb] 4096-byte physical blocks
>>> [ 5.190084] sd 3:0:0:0: [sdb] 29808 zones of 524288 logical blocks + 1 runt zone
>>> [ 5.197439] sd 3:0:0:0: [sdb] Write Protect is off
>>> [ 5.202220] sd 3:0:0:0: [sdb] Mode Sense: 00 3a 00 00
>>> [ 5.207260] sd 3:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
>>> [ 5.356631] sdb: sdb1
>>> [ 5.359014] sdb: disabling host aware zoned block device support due to partitions
>>> [ 5.389941] ------------[ cut here ]------------
>>> [ 5.394557] WARNING: CPU: 8 PID: 164 at block/blk-settings.c:236 blk_queue_max_zone_append_sectors+0x12/0x40
>>> [ 5.404300] Modules linked in:
>>> [ 5.407365] CPU: 8 PID: 164 Comm: kworker/u32:6 Not tainted 5.8.0 #7
>>> [ 5.413682] Hardware name: Micro-Star International Co., Ltd. MS-7B79/X470 GAMING PRO (MS-7B79), BIOS 1.70 01/23/2019
>>> [ 5.424191] Workqueue: events_unbound async_run_entry_fn
>>> [ 5.429482] RIP: 0010:blk_queue_max_zone_append_sectors+0x12/0x40
>>> [ 5.435543] Code: fe 0f 00 00 53 48 89 fb 0f 86 3d 07 00 00 48 89 b3 e0 03 00 00 5b c3 90 0f 1f 44 00 00 8b 87 40 04 00 00 ff c8 83 f8 01 76 03 <0f> 0b c3 8b 87 f8 03 00 00 39 87 f0 03 00 00 0f 46 87 f0 03 00 00
>>> [ 5.454099] RSP: 0018:ffffc90000697c60 EFLAGS: 00010282
>>> [ 5.459306] RAX: 00000000ffffffff RBX: ffff8887fa0a9400 RCX: 0000000000000000
>>> [ 5.466390] RDX: ffff8887faf0d400 RSI: 0000000000000540 RDI: ffff8887f0dde6c8
>>> [ 5.473474] RBP: 0000000000007471 R08: 00000000001d1c40 R09: ffff8887fee29ad0
>>> [ 5.480559] R10: 00000001434bac00 R11: 0000000000358275 R12: 0000000000080000
>>> [ 5.487643] R13: ffff8887f0dde6c8 R14: ffff8887fa0a9738 R15: 0000000000000000
>>> [ 5.494726] FS: 0000000000000000(0000) GS:ffff8887fee00000(0000) knlGS:0000000000000000
>>> [ 5.502757] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [ 5.508474] CR2: 0000000000000000 CR3: 0000000002209000 CR4: 00000000003406e0
>>> [ 5.515558] Call Trace:
>>> [ 5.518026] sd_zbc_read_zones+0x323/0x480
>>> [ 5.522122] sd_revalidate_disk+0x122b/0x2000
>>> [ 5.526472] ? __device_add_disk+0x2f7/0x4e0
>>> [ 5.530738] sd_probe+0x347/0x44b
>>> [ 5.534058] really_probe+0x2c4/0x3f0
>>> [ 5.537720] driver_probe_device+0xe1/0x150
>>> [ 5.541902] ? driver_allows_async_probing+0x50/0x50
>>> [ 5.546852] bus_for_each_drv+0x6a/0xa0
>>> [ 5.550683] __device_attach_async_helper+0x8c/0xd0
>>> [ 5.555547] async_run_entry_fn+0x4a/0x180
>>> [ 5.559636] process_one_work+0x1a5/0x3a0
>>> [ 5.563637] worker_thread+0x50/0x3a0
>>> [ 5.567300] ? process_one_work+0x3a0/0x3a0
>>> [ 5.571480] kthread+0x117/0x160
>>> [ 5.574715] ? kthread_park+0x90/0x90
>>> [ 5.578377] ret_from_fork+0x22/0x30
>>> [ 5.581960] ---[ end trace 94141003236730cf ]---
>>> [ 5.586578] sd 3:0:0:0: [sdb] Attached SCSI disk
>>> [ 6.186783] ata5: failed to resume link (SControl 0)
>>> [ 6.191818] ata5: SATA link down (SStatus 0 SControl 0)
>>>
>
>
> This looks like we're trying to configure zone append max sectors
> on a device that doesn't have the zoned flag set.

Yep. That's because sd_zbc_revalidate_zones() entry test uses sd_is_zoned() and
does not look at queue->limits.zoned which was changed to BLK_ZONE_NONE while
sd_is_zoned() still correctly says that the drive is host-aware. We need to
change the entry test to use blk_queue_is_zoned() instead of sd_is_zoned().

>
>>
>> Can you try this:
>>
>> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
>> index 95018e650f2d..620539162ef1 100644
>> --- a/drivers/scsi/sd.c
>> +++ b/drivers/scsi/sd.c
>> @@ -2968,8 +2968,13 @@ static void sd_read_block_characteristics(struct
>> scsi_disk *sdkp)
>> } else {
>> sdkp->zoned = (buffer[8] >> 4) & 3;
>> if (sdkp->zoned == 1 && !disk_has_partitions(sdkp->disk)) {
>> +#ifdef CONFIG_BLK_DEV_ZONED
>> /* Host-aware */
>> q->limits.zoned = BLK_ZONED_HA;
>> +#else
>> + /* Host-aware drive is treated as a regular disk */
>> + q->limits.zoned = BLK_ZONED_NONE;
>> +#endif
>> } else {
>> /*
>> * Treat drive-managed devices and host-aware devices
>> @@ -3404,12 +3409,12 @@ static int sd_probe(struct device *dev)
>> sdkp->first_scan = 1;
>> sdkp->max_medium_access_timeouts = SD_MAX_MEDIUM_TIMEOUTS;
>>
>> + sd_revalidate_disk(gd);
>> +
>> error = sd_zbc_init_disk(sdkp);
>> if (error)
>> goto out_free_index;
>>
>> - sd_revalidate_disk(gd);
>> -
>
>
> I don't get how my patch may have broken this. If we have
> CONFIG_BLK_DEV_ZONED=n, sd_zbc_init_disk() is stubbed out and return 0
> unconditionally. So the call path will remain exactly the same.

Yes, I am not 100% sure what is going on with CONFIG_BLK_DEV_ZONED=n. As far as
I can see, everything should be OK, but clearly not... Let's fix that Monday
first thing.

Also, partitions/core.c may need some attention: with the zone append
initialization already done, if a partition is created, changing the disk from
HA to NONE should be done properly, cleaning up the zone offset array and max
zone append sectors... But we can't call into scsi directly, so need to check
that revalidation is triggered and detect the change... Not pretty. And the
reverse too: if partitions are deleted, we need to go back to zoned==HA and so
reinitialize zone append emulation.

At this point, I really would love to simply treat host-aware disks as regular
disks, always... That would make things *a lot* more simple.

>
>> gd->flags = GENHD_FL_EXT_DEVT;
>> if (sdp->removable) {
>> gd->flags |= GENHD_FL_REMOVABLE;
>> diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
>> index 4933e7daf17d..f4dc81d48a01 100644
>> --- a/drivers/scsi/sd.h
>> +++ b/drivers/scsi/sd.h
>> @@ -241,6 +241,8 @@ static inline void sd_zbc_release_disk(struct scsi_disk
>> *sdkp) {}
>> static inline int sd_zbc_read_zones(struct scsi_disk *sdkp,
>> unsigned char *buf)
>> {
>> + if (sd_is_zoned(sdkp))
>> + sdkp->capacity = 0;
>> return 0;
>> }
>>
>> That should fix the above as well as the hang on boot with CONFIG_BLK_DEV_ZONED
>> disabled (for that one I do not totally understand what is going on...).
>>
>> We do not have any host-aware disk for testing (as far as I know, nobody is
>> selling these anymore), so our test setup is a bit lame in this area. We'll rig
>> something up with tcmu-runner emulation to add tests for these devices to avoid
>> a repeat of such problem. And we'll make sure to add a test for
>> host-aware+partitions, since we at least know for sure there is one user :)
>>
>> Johannes: The "goto out_free_index;" on sd_zbc_init_disk() failure is wrong I
>> think: the disk is already added and a ref taken on the dev, but out_free_index
>> does not seem to do cleanup for that. Need to revisit this.
>
> Yes just seen it as well, will be cooking a fix for that.
>
> I'll build a test env to nail this down.

Here is my patch, revisited. Not complete yet, b ut no space corruption this time !


diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 95018e650f2d..49d6dc374fb6 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -2968,22 +2968,36 @@ static void sd_read_block_characteristics(struct
scsi_disk *sdkp)
} else {
sdkp->zoned = (buffer[8] >> 4) & 3;
if (sdkp->zoned == 1 && !disk_has_partitions(sdkp->disk)) {
+#ifdef CONFIG_BLK_DEV_ZONED
/* Host-aware */
q->limits.zoned = BLK_ZONED_HA;
+#else
+ /* Host-aware drive is treated as a regular disk */
+ q->limits.zoned = BLK_ZONED_NONE;
+#endif
} else {
/*
* Treat drive-managed devices and host-aware devices
* with partitions as regular block devices.
*/
q->limits.zoned = BLK_ZONED_NONE;
- if (sdkp->zoned == 2 && sdkp->first_scan)
- sd_printk(KERN_NOTICE, sdkp,
- "Drive-managed SMR disk\n");
}
}
- if (blk_queue_is_zoned(q) && sdkp->first_scan)
+
+ if (!sdkp->first_scan)
+ goto out;
+
+ if (blk_queue_is_zoned(q)) {
sd_printk(KERN_NOTICE, sdkp, "Host-%s zoned block device\n",
q->limits.zoned == BLK_ZONED_HM ? "managed" : "aware");
+ } else {
+ if (sdkp->zoned == 1)
+ sd_printk(KERN_NOTICE, sdkp,
+ "Host-aware SMR disk used as regular disk\n");
+ else if (sdkp->zoned == 2)
+ sd_printk(KERN_NOTICE, sdkp,
+ "Drive-managed SMR disk\n");
+ }

out:
kfree(buffer);
@@ -3404,12 +3418,12 @@ static int sd_probe(struct device *dev)
sdkp->first_scan = 1;
sdkp->max_medium_access_timeouts = SD_MAX_MEDIUM_TIMEOUTS;

+ sd_revalidate_disk(gd);
+
error = sd_zbc_init_disk(sdkp);
if (error)
goto out_free_index;

- sd_revalidate_disk(gd);
-
gd->flags = GENHD_FL_EXT_DEVT;
if (sdp->removable) {
gd->flags |= GENHD_FL_REMOVABLE;
diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
index 4933e7daf17d..f4dc81d48a01 100644
--- a/drivers/scsi/sd.h
+++ b/drivers/scsi/sd.h
@@ -241,6 +241,8 @@ static inline void sd_zbc_release_disk(struct scsi_disk
*sdkp) {}
static inline int sd_zbc_read_zones(struct scsi_disk *sdkp,
unsigned char *buf)
{
+ if (sd_is_zoned(sdkp))
+ sdkp->capacity = 0;
return 0;
}




--
Damien Le Moal
Western Digital Research