Re: next bio iters break discard?

From: Kent Overstreet
Date: Mon Jan 13 2014 - 21:33:54 EST


On Sun, Jan 12, 2014 at 07:52:40PM -0800, Hugh Dickins wrote:
> When I try to exercise heavy swapping with discard on mmotm 2014-01-09,
> I soon hit a NULL pointer dereference in __blk_recalc_rq_segments():
>
> __blk_recalc_rq_segments
> blk_recount_segments
> ll_back_merge_fn
> bio_attempt_back_merge
> blk_queue_bio
> generic_make_request
> submit_bio
> blkdev_issue_discard
> swap_do_scheduled_discard
> scan_swap_map_try_ssd_cluster
> scan_swap_map
> get_swap_page
> add_to_swap
> shrink_page_list
> etc. etc.
>
> The crash is on the NULL struct page pointer in page_to_pfn(bv.bv_page)
> on line 35 of block/blk-merge.c.
>
> The code around there is not very different from 3.13-rc8 (which doesn't
> crash), and I didn't notice REQ_DISCARD or bio_has_data() checks removed.
>
> I think it worked before because the old bio_for_each_segment()
> iterator was a straightforward "i < bio->bi_vcnt" loop which would
> do nothing when bi_vcnt is 0; but the new iterators are relying
> (perhaps) on bio->bi_iter.bi_size which is non-0 despite no data?
>
> I expect it would crash in the same way on other recent nexts and
> mmotms, I've not tried.
>
> Hugh

Ugh, discards. Wonder why this wasn't seen sooner, I can't figure out what the
null pointer deref actually was but if it was __blk_recalc_rq_segments() blowing
up, that shouldn't have had to wait for two discards to get merged to get
called.

(calling bio_for_each_segment() on REQ_DISCARD/REQ_WRITE_SAME bios should in
general work; bio_advance_iter() checks against BIO_NO_ADVANCE_ITER_MASK to
determine whether to advance down the bvec or just decrement bi_size. But for
counting segments bio_for_each_segment() is definitely not what we want.)

I think for discards we can deal with this easily enough -
__blk_recalc_rq_segments() will have to special case them - but there's a
similar (but worse) issue with WRITE_SAME, and looking at the code it does
attempt to merge WRITE_SAME requests too.

Jens, Martin - are we sure we want to merge WRITE_SAME requests? I'm not sure I
even see how that makes sense, at the very least it's a potential minefield.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/