Re: [PATCH] zram: Fix unbalanced idr management at hot removal

From: Minchan Kim
Date: Mon Nov 21 2016 - 19:11:14 EST


On Mon, Nov 21, 2016 at 02:21:40PM +0100, Takashi Iwai wrote:
> The zram hot removal code calls idr_remove() even when zram_remove()
> returns an error (typically -EBUSY). This results in a leftover at
> the device release, eventually leading to a crash when the module is
> reloaded.
>
> As described in the bug report below, the following procedure would
> cause an Oops with zram:
>
> - provision three zram devices via modprobe zram num_devices=3
> - configure a size for each device
> + echo "1G" > /sys/block/$zram_name/disksize
> - mkfs and mount zram0 only
> - attempt to hot remove all three devices
> + echo 2 > /sys/class/zram-control/hot_remove
> + echo 1 > /sys/class/zram-control/hot_remove
> + echo 0 > /sys/class/zram-control/hot_remove
> - zram0 removal fails with EBUSY, as expected
> - unmount zram0
> - try zram0 hot remove again
> + echo 0 > /sys/class/zram-control/hot_remove
> - fails with ENODEV (unexpected)
> - unload zram kernel module
> + completes successfully
> - zram0 device node still exists
> - attempt to mount /dev/zram0
> + mount command is killed
> + following BUG is encountered
>
> BUG: unable to handle kernel paging request at ffffffffa0002ba0
> IP: [<ffffffff812eead6>] get_disk+0x16/0x50
> Oops: 0000 [#1] SMP
> CPU: 0 PID: 252 Comm: mount Not tainted 4.9.0-rc6 #176
> task: ffff88001a9f2800 task.stack: ffffc90000300000
> RIP: 0010:[<ffffffff812eead6>] [<ffffffff812eead6>] get_disk+0x16/0x50
> Call Trace:
> [<ffffffff812eeb1c>] exact_lock+0xc/0x20
> [<ffffffff813b3e1c>] kobj_lookup+0xdc/0x160
> [<ffffffff812edce0>] ? disk_map_sector_rcu+0x70/0x70
> [<ffffffff81127410>] ? blkdev_get_by_dev+0x50/0x50
> [<ffffffff812eef4f>] get_gendisk+0x2f/0x110
> [<ffffffff81127410>] ? blkdev_get_by_dev+0x50/0x50
> [<ffffffff81126e2c>] __blkdev_get+0x10c/0x3c0
> [<ffffffff81127410>] ? blkdev_get_by_dev+0x50/0x50
> [<ffffffff8112727d>] blkdev_get+0x19d/0x2e0
> [<ffffffff81127410>] ? blkdev_get_by_dev+0x50/0x50
> [<ffffffff81127466>] blkdev_open+0x56/0x70
> [<ffffffff810f3e0f>] do_dentry_open.isra.19+0x1ff/0x310
> [<ffffffff810f4aa3>] vfs_open+0x43/0x60
> [<ffffffff81103009>] path_openat+0x2c9/0xf30
> [<ffffffff81023c00>] ? __save_stack_trace+0x40/0xd0
> [<ffffffff81104b79>] do_filp_open+0x79/0xd0
> [<ffffffff81538219>] ? kmemleak_alloc+0x49/0xa0
> [<ffffffff810f4e44>] do_sys_open+0x114/0x1e0
> [<ffffffff810f4f29>] SyS_open+0x19/0x20
> [<ffffffff8153c2e0>] entry_SYSCALL_64_fastpath+0x13/0x94
>
> This patch adds the proper error check in hot_remove_store() not to
> call idr_remove() unconditionally.
>
> Bugzilla: https://bugzilla.opensuse.org/show_bug.cgi?id=1010970
> Reported-and-tested-by: David Disseldorp <ddiss@xxxxxxx>
> Reviewed-by: David Disseldorp <ddiss@xxxxxxx>
> Cc: <stable@xxxxxxxxxxxxxxx>
> Signed-off-by: Takashi Iwai <tiwai@xxxxxxx>
Acked-by: Minchan Kim <minchan@xxxxxxxxxx>

Thanks!