Re: bcache_writeback: bch_writeback_thread...blk_queue_bio IO hang [was BUG: soft lockup]

From: Yannis Aribaud
Date: Tue Feb 02 2016 - 03:06:59 EST


Hi,

1 fÃvrier 2016 20:42 "Eric Wheeler" <bcache@xxxxxxxxxxxxxxxxxx> a Ãcrit:
> [ changed subject, more below ]

> Are you using md-based raid5/raid6?

Not at all. In my setup Bcache is used on two bare drives (HDD and SSD)
behing a hardware RAID controller (disks configured in non-RAID mode).

>
> If so, it could be the raid_partial_stripes_expensive bug.
>
> If it *is* the bcache optimizations around partial_stripes_expensive, then
> please try the patch below and then do this *before* loading the bcache
> module (or at least before registering the bcache backing volume):
> echo 0 > /sys/block/BLK/queue/limits/raid_partial_stripes_expensive
> where BLK is your backing volume.
>
> I wrote this patch because we use hardware RAID5/6 and wanted to get the
> partial_stripes_expensive optimizations on by setting
> raid_partial_stripes_expensive=1 and io_opt=our_stride_width.
> Unfortunately it caused the backtrace in the LKML thread below, so we
> stopped using it.
>
> See also this thread, however, it shows a backtrace prior to
> removing the bio splitting code:
> https://lkml.org/lkml/2016/1/7/844

I'll take a look.

> Note that commit 749b61dab30736eb95b1ee23738cae90973d4fc3 might not
> exactly address this issue, but it might prevent full hangs. Make sure
> you cherry-pick commit 749b61dab30736eb95b1ee23738cae90973d4fc3 and
> hand-clean-up as necessary. It simplifies the bio code and deletes a bunch
> of stuff. It showed up in 4.4 and isn't in my patchset.
>
> After commit the commit above and setting
> raid_partial_stripes_expensive=1
> we still get errors like this:
> bcache: bch_count_io_errors() dm-3: IO error on writing data to cache,
> recovering but they don't lock up the system. Ultimately we run with
> raid_partial_stripes_expensive=0 because of these related problemsand
> haven't had any issues. Also, fwiw, we run 4.1.y as our stable branch
> backed by hardware raid.

I'll give a try if find some time to do so.

Thks.
--
Open is better