Re: [RFC PATCH 4/3] block: skip elevator initialization for flushrequests

From: Mike Snitzer
Date: Tue Feb 01 2011 - 13:11:47 EST


On Wed, Jan 26 2011 at 5:03am -0500,
Tejun Heo <tj@xxxxxxxxxx> wrote:

> > diff --git a/block/blk-core.c b/block/blk-core.c
> > index 72dd23b..f507888 100644
> > --- a/block/blk-core.c
> > +++ b/block/blk-core.c
> > @@ -764,7 +764,7 @@ static struct request *get_request(struct request_queue *q, int rw_flags,
> > struct request_list *rl = &q->rq;
> > struct io_context *ioc = NULL;
> > const bool is_sync = rw_is_sync(rw_flags) != 0;
> > - int may_queue, priv;
> > + int may_queue, priv = 0;
> >
> > may_queue = elv_may_queue(q, rw_flags);
> > if (may_queue == ELV_MQUEUE_NO)
> > @@ -808,9 +808,14 @@ static struct request *get_request(struct request_queue *q, int rw_flags,
> > rl->count[is_sync]++;
> > rl->starved[is_sync] = 0;
> >
> > - priv = !test_bit(QUEUE_FLAG_ELVSWITCH, &q->queue_flags);
> > - if (priv)
> > - rl->elvpriv++;
> > + /*
> > + * Skip elevator initialization for flush requests
> > + */
> > + if (!(bio && (bio->bi_rw & (REQ_FLUSH | REQ_FUA)))) {
> > + priv = !test_bit(QUEUE_FLAG_ELVSWITCH, &q->queue_flags);
> > + if (priv)
> > + rl->elvpriv++;
> > + }
>
> I thought about doing it this way but I think we're burying the
> REQ_FLUSH|REQ_FUA test logic too deep. get_request() shouldn't
> "magically" know not to allocate elevator data.

There is already a considerable amount of REQ_FLUSH|REQ_FUA special
casing magic sprinkled though-out the block layer. Why is this
get_request() change the case that goes too far?

> The decision should
> be made higher in the stack and passed down to get_request(). e.g. if
> REQ_SORTED is set in @rw, elevator data is allocated; otherwise, not.

Considering REQ_SORTED is set in elv_insert(), well after get_request()
is called, I'm not seeing what you're suggesting.

Anyway, I agree that ideally we'd have a mechanism to explicitly
short-circuit elevator initialization. But doing so in a meaningful way
would likely require a fair amount of refactoring of get_request* and
its callers. I'll come back to this and have another look but my gut is
this interface churn wouldn't _really_ help -- all things considered.

> > diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> > index 8a082a5..0c569ec 100644
> > --- a/include/linux/blkdev.h
> > +++ b/include/linux/blkdev.h
> > @@ -99,25 +99,29 @@ struct request {
> > /*
> > * The rb_node is only used inside the io scheduler, requests
> > * are pruned when moved to the dispatch queue. So let the
> > - * flush fields share space with the rb_node.
> > + * completion_data share space with the rb_node.
> > */
> > union {
> > struct rb_node rb_node; /* sort/lookup */
> > - struct {
> > - unsigned int seq;
> > - struct list_head list;
> > - } flush;
> > + void *completion_data;
> > };
> >
> > - void *completion_data;
> > -
> > /*
> > * Three pointers are available for the IO schedulers, if they need
> > - * more they have to dynamically allocate it.
> > + * more they have to dynamically allocate it. Let the flush fields
> > + * share space with these three pointers.
> > */
> > - void *elevator_private;
> > - void *elevator_private2;
> > - void *elevator_private3;
> > + union {
> > + struct {
> > + void *private;
> > + void *private2;
> > + void *private3;
> > + } elevator;
> > + struct {
> > + unsigned int seq;
> > + struct list_head list;
> > + } flush;
> > + };
>
> Another thing is, can we please make private* an array? The number
> postfixes are irksome. It's even one based instead of zero!

Sure, I can sort that out.

> > Also, it would be great to better describe the lifetime difference
> > between the first and the second unions and why it has be organized
> > this way (rb_node and completion_data can live together but rb_node
> > and flush can't).
>
> Oops, what can't live together are elevator_private* and
> completion_data.

I'll better describe the 2nd union's sharing in the next revision.

Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/