Re: [PATCH 3/3] ide: use per-device request queue locks

From: Elias Oltmanns
Date: Wed Dec 17 2008 - 10:54:27 EST


Bartlomiej Zolnierkiewicz <bzolnier@xxxxxxxxx> wrote:
> [ Once again, sorry for the long delay. ]

Never mind, my responses are rather sluggish these days too.

>
> On Monday 24 November 2008, Elias Oltmanns wrote:
>> Bartlomiej Zolnierkiewicz <bzolnier@xxxxxxxxx> wrote:
[...]
>> > drivers/ide/ide-io.c | 213 +++++++++++++++---------------------------------
>> > drivers/ide/ide-park.c | 13 +-
>> > drivers/ide/ide-probe.c | 3
>> > include/linux/ide.h | 4
>> > 4 files changed, 79 insertions(+), 154 deletions(-)
>> >
>> > Index: b/drivers/ide/ide-io.c
>> > ===================================================================
>> > --- a/drivers/ide/ide-io.c
>> > +++ b/drivers/ide/ide-io.c
>> [...]
>> > @@ -780,61 +704,40 @@ repeat:
>> > */
>> > void do_ide_request(struct request_queue *q)
>> > {
>> > - ide_drive_t *orig_drive = q->queuedata;
>> > - ide_hwgroup_t *hwgroup = orig_drive->hwif->hwgroup;
>> > - ide_drive_t *drive;
>> > - ide_hwif_t *hwif;
>> > + ide_drive_t *drive = q->queuedata;
>> > + ide_hwif_t *hwif = drive->hwif;
>> > + ide_hwgroup_t *hwgroup = hwif->hwgroup;
>> > struct request *rq;
>> > ide_startstop_t startstop;
>> >
>> > - /* caller must own hwgroup->lock */
>> > - BUG_ON(!irqs_disabled());
>> > -
>> > - while (!ide_lock_hwgroup(hwgroup)) {
>> > - drive = choose_drive(hwgroup);
>> > - if (drive == NULL) {
>> > - int sleeping = 0;
>> > - unsigned long sleep = 0; /* shut up, gcc */
>> > - hwgroup->rq = NULL;
>> > - drive = hwgroup->drive;
>> > - do {
>> > - if ((drive->dev_flags & IDE_DFLAG_SLEEPING) &&
>> > - (sleeping == 0 ||
>> > - time_before(drive->sleep, sleep))) {
>> > - sleeping = 1;
>> > - sleep = drive->sleep;
>> > - }
>> > - } while ((drive = drive->next) != hwgroup->drive);
>> > - if (sleeping) {
>> > + /*
>> > + * drive is doing pre-flush, ordered write, post-flush sequence. even
>> > + * though that is 3 requests, it must be seen as a single transaction.
>> > + * we must not preempt this drive until that is complete
>> > + */
>> > + if (blk_queue_flushing(q))
>> > /*
>> > - * Take a short snooze, and then wake up this hwgroup again.
>> > - * This gives other hwgroups on the same a chance to
>> > - * play fairly with us, just in case there are big differences
>> > - * in relative throughputs.. don't want to hog the cpu too much.
>> > + * small race where queue could get replugged during
>> > + * the 3-request flush cycle, just yank the plug since
>> > + * we want it to finish asap
>> > */
>> > - if (time_before(sleep, jiffies + WAIT_MIN_SLEEP))
>> > - sleep = jiffies + WAIT_MIN_SLEEP;
>> > -#if 1
>> > - if (timer_pending(&hwgroup->timer))
>> > - printk(KERN_CRIT "ide_set_handler: timer already active\n");
>> > -#endif
>> > - /* so that ide_timer_expiry knows what to do */
>> > - hwgroup->sleeping = 1;
>> > - hwgroup->req_gen_timer = hwgroup->req_gen;
>> > - mod_timer(&hwgroup->timer, sleep);
>> > - /* we purposely leave hwgroup locked
>> > - * while sleeping */
>> > - } else
>> > - ide_unlock_hwgroup(hwgroup);
>> > + blk_remove_plug(q);
>>
>> I'm not at all convinced that this works as expected. First of all, I
>> think we can safely assume that the plug is removed when block layer
>> calls into the ->request_fn(). Secondly, since the ide layer doesn't
>> call the ->request_fn() on it's own accord, I rather suspect that this
>> check can be dropped altogether. On the other hand, I'm not sure I agree
>
> I suspect that this is leftover from the old code and I also think that
> it can be removed completely. However mixing too many real code changes
> in a single patch is not a good practice and the removal can be handled
> independently of the discussed patch.
>
> If you would send me a patch with the above change I will be happy to
> integrate it into pata tree (patch can be against current Linus' tree or
> linux-next, either one is completely fine with me).

I'll have a go at it later.

[...]
>> Finally, some more general blathering on the topic at hand: A while ago,
>> I spent some thought on the possibilities of giving the block layer a
>> notion of linked device queues as an equivalent hwgroups in ide,
>> scsi_hosts or ata_ports and let it take care of time / bandwidth
>> distribution among the queues belonging to one group. This is, as I
>> understand, pretty much what your code is relying on since you have
>> chucked out choose_drive(). However, this turned out not to be too easy
>
> This is the right way to go and I has always advocated for it. However
> after seeing how libata got away with ignoring the issue altogether
> I'm no longer sure of this. I haven't seen any bug reports which would
> indicate that simplified approach has any really negative consequences.

Well, libata can safely ignore it since scsi takes care of that (see
scsi_run_queue() which is called on command completion).

>
> [ Still would be great to have the control over bandwitch of "queue-group"
> at the block layer level since we could also use it for distributing the
> available PCI[-E] bus bandwitch. ]
>
>> and I'm not quite sure whether we really want to go down that road. One
>> major problem is that there is no straight forward way for the block
>> layer to know, whether a ->request_fn() has actually taken a request off
>> the queue and if not (or less than queue_depth anyway), whether it's
>> just the device that couldn't take any more or the controller instead.
>> On the whole, it seems not exactly trivial to get it right and it would
>> probably be a good idea to consult Jens and perhaps James before
>
> I think that having more information returned by ->request_fn() could be
> beneficial to the block layer (independently whether we end up adding
> support for "queue-groups" to the block layer or not) but this definitely
> needs to be verified with Jens & James.

Some time back, I raised this with Jens in connection with your previous
version of the patchset [1]. I didn't get an answer at the time but
perhaps it would help to raise it again in its own right and to give
some more examples of its potential merits.

>
>> embarking on such a venture. Short of that, I think that ide layer has
>> to keep an appropriate equivalent of choose_drive() and also the while
>> loop in the do_ide_request() function.
>
> Thank you for your review. v1->v2 interdiff below.
>
> v2:
> * Fixes/improvements based on review from Elias:
> - take as many requests off the queue as possible
> - remove now redundant BUG_ON()

Looks alright to me.

Regards,

Elias
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/