Re: [PATCH] Priorities in Anticipatory I/O scheduler

From: Dave Chinner
Date: Tue Oct 28 2008 - 19:31:32 EST


On Tue, Oct 28, 2008 at 03:48:44PM -0700, Naveen Gupta wrote:
> 2008/10/28 Dave Chinner <david@xxxxxxxxxxxxx>:
> > On Tue, Oct 28, 2008 at 10:14:20AM -0700, Naveen Gupta wrote:
> >> 2008/10/27 Dave Chinner <david@xxxxxxxxxxxxx>:
> >> > On Mon, Oct 27, 2008 at 12:01:32PM -0700, ngupta@xxxxxxxxxx wrote:
> >> >>
> >> >> Modifications to the Anticipatory I/O scheduler to add multiple priority
> >> >> levels. It makes use of anticipation and batching in current
> >> >> anticipatory scheduler to implement priorities.
> > .....
> >> >> In this patch I have added a new class IOPRIO_CLASS_LATENCY to differentiate
> >> >> notion of absolute priority over existing uses of various time-slice based
> >> >> priority classes in cfq. Though internally within anticipatory scheduler all
> >> >> of them map to best-effort levels. Hence, one can also use various best-effort
> >> >> priority levels.
> >> >
> >> > Please don't introduce yet another incompatible behaviour between
> >> > I/O schedulers. It's bad enough from an optimisation point of view
> >> > that BIO_RW_SYNC and BIO_RW_META mean different things to different
> >> > schedulers, let alone that only CFQ currently understands
> >> > priorities. If you are going to introduce priorities into AS, then
> >> > please, please, please make it use the same interface as CFQ.
> >> >
> >> > Why? Both the extN and XFS devs have been considering bumping the
> >> > priority of journal writes using the existing CFQ-based I/O priority
> >> > mechanism - the last thing I want to see is a different scheduler
> >> > requiring a different priority configuration to acheive the same
> >> > optimisation. There is no way we can support this sort of
> >> > optimisation in the filesystem code if the interface changes when
> >> > the I/O scheduler changes. So please use the existing IOPRIO classes
> >> > to map the priorities for the AS scheduler.
> >> >
> >>
> >> The anticipatory scheduler chooses it's next i/o to be of highest
> >> available priority level.
> >
> > That sounds exactly like what the current RT class is supposed to
> > be used for - defining the absolute priority of dispatch. How
> > is this latency class different to the current RT class semantics
> > that are defined for CFQ?
> >
>
> I/O from RT class in CFQ can still see a bubble with this new latency
> class. An easy way to check this would be to submit ios at multiple
> levels both in CFQ and AS and check max latency of the highest levels.
> I will let Jens or Satoshi comment on exact algorithm for RT class.

You're missing my point entirely.

You're defining a new class that has the exact same meaning as
the current RT class definition, then mapping the BE class over
the top of that, hence changing what that means for everyone.

The fact that the *implementation* of AS and CFQ is different is
irrelevant; if you use the RT class then on CFQ you get the current
RT behaviour, if you use the RT class on AS you should get your new
priority dispatch mechanism. We don't need a new API just because
the implementations are different.

> >> So, in some sense it kind of implements absolute priority and
> >> is best used for jobs which are latency sensitive. Since the
> >> priorities can be and are mapped internally in anticipatory
> >> scheduler, BEST_EFFORT class is mapped one-one with the LATENCY
> >> class.
> >
> > So you map the BE class to something with the same semantics as
> > the RT class? What mapping do you do when an application uses
> > the RT class?
> >
>
> Yes I could have used RT class but it was used in CFQ to implement
> it's time-sliced based highest priority class. If an application
> uses RT class, AS maps all levels of RT class to BE class level 0
> (i.e. to the highest priority available)

Which means you are throwing away all the RT priority levels and
so an application using the RT class would be subtly broken on AS....

> >> A filesystem can use best-effort class using similar interface
> >> as for cfq.
> >
> > The folk using the RT priority classes greatly objected to using
> > the RT class for journal I/O precisely because it would then
> > preempt their application's RT I/O and introduce unpredictable
> > latencies.
> >
> > Journal I/O will typically use the highest priority BE class so
> > that it is promoted above BE I/O but does not preempt RT I/O.
> > With your mapping of BE classes to this new "absolute priority
> > latency" class, this configuration will give journal I/O the
> > highest priority in the scheduler. This will cause preemption of
> > your latency sensitive I/O and so those latencies you are trying
> > to avoid won't go away....
> >
>
> I see your problem, we could make the LATENCY class different from
> and above BE class (instead of one-one mapping).

Like the RT class is currently defined to be? ;)

Cheers,

Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/