Re: [PATCH] bio: modify __bio_add_page() to accept pages that don't start a new segment

From: Maurizio Lombardi
Date: Tue Apr 29 2014 - 11:54:15 EST


Sorry I did a mistake in this patch: on failure I should restore the original value
of bi_phys_segments.

I'm going to send a new version.

Maurizio Lombardi

On Tue, Apr 29, 2014 at 04:58:18PM +0200, Maurizio Lombardi wrote:
> The original behaviour is to refuse to add a new page if the maximum number
> of segments has been reached, regardless of the fact the page we are
> going to add can be merged into the last segment or not.
>
> Unfortunately, when the system runs under heavy memory fragmentation conditions,
> a driver may try to add multiple pages to the last segment.
> The original code won't accept them and EBUSY will be reported to
> userspace.
>
> This patch modifies the function so it refuses to add a page
> only in case the latter starts a new segment and the maximum number
> of segments has already been reached.
>
> The bug can be easily reproduced with the st driver:
>
> 1) set CONFIG_SCSI_MPT2SAS_MAX_SGE or CONFIG_SCSI_MPT3SAS_MAX_SGE to 16
> 2) modprobe st buffer_kbs=1024
> 3) #dd if=/dev/zero of=/dev/st0 bs=1M count=10
> dd: error writing â/dev/st0â: Device or resource busy
>
> Signed-off-by: Maurizio Lombardi <mlombard@xxxxxxxxxx>
> ---
> fs/bio.c | 50 ++++++++++++++++++++++++++++----------------------
> 1 file changed, 28 insertions(+), 22 deletions(-)
>
> diff --git a/fs/bio.c b/fs/bio.c
> index 6f0362b..9a3a0b1 100644
> --- a/fs/bio.c
> +++ b/fs/bio.c
> @@ -750,29 +750,31 @@ static int __bio_add_page(struct request_queue *q, struct bio *bio, struct page
> return 0;
>
> /*
> - * we might lose a segment or two here, but rather that than
> - * make this too complex.
> + * setup the new entry, we might clear it again later if we
> + * cannot add the page
> + */
> + bvec = &bio->bi_io_vec[bio->bi_vcnt];
> + bvec->bv_page = page;
> + bvec->bv_len = len;
> + bvec->bv_offset = offset;
> + bio->bi_vcnt++;
> + bio->bi_phys_segments++;
> +
> + /*
> + * Perform a recount if the number of segments is greater
> + * than queue_max_segments(q).
> */
>
> - while (bio->bi_phys_segments >= queue_max_segments(q)) {
> + while (bio->bi_phys_segments > queue_max_segments(q)) {
>
> if (retried_segments)
> - return 0;
> + goto failed;
>
> retried_segments = 1;
> blk_recount_segments(q, bio);
> }
>
> /*
> - * setup the new entry, we might clear it again later if we
> - * cannot add the page
> - */
> - bvec = &bio->bi_io_vec[bio->bi_vcnt];
> - bvec->bv_page = page;
> - bvec->bv_len = len;
> - bvec->bv_offset = offset;
> -
> - /*
> * if queue has other restrictions (eg varying max sector size
> * depending on offset), it can specify a merge_bvec_fn in the
> * queue to get further control
> @@ -789,23 +791,27 @@ static int __bio_add_page(struct request_queue *q, struct bio *bio, struct page
> * merge_bvec_fn() returns number of bytes it can accept
> * at this offset
> */
> - if (q->merge_bvec_fn(q, &bvm, bvec) < bvec->bv_len) {
> - bvec->bv_page = NULL;
> - bvec->bv_len = 0;
> - bvec->bv_offset = 0;
> - return 0;
> - }
> + if (q->merge_bvec_fn(q, &bvm, bvec) < bvec->bv_len)
> + goto failed;
> }
>
> /* If we may be able to merge these biovecs, force a recount */
> - if (bio->bi_vcnt && (BIOVEC_PHYS_MERGEABLE(bvec-1, bvec)))
> + if (bio->bi_vcnt > 1 && (BIOVEC_PHYS_MERGEABLE(bvec-1, bvec)))
> bio->bi_flags &= ~(1 << BIO_SEG_VALID);
>
> - bio->bi_vcnt++;
> - bio->bi_phys_segments++;
> done:
> bio->bi_iter.bi_size += len;
> return len;
> +
> + failed:
> + bvec->bv_page = NULL;
> + bvec->bv_len = 0;
> + bvec->bv_offset = 0;
> + bio->bi_vcnt--;
> + if (!retried_segments)
> + bio->bi_phys_segments--;
> +
> + return 0;
> }
>
> /**
> --
> Maurizio Lombardi
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/