Re: [PATCH v4 0/3] nvme power saving

From: J Freyensee
Date: Thu Sep 22 2016 - 17:33:49 EST


On Thu, 2016-09-22 at 14:43 -0600, Jens Axboe wrote:
> On 09/22/2016 02:11 PM, Andy Lutomirski wrote:
> >
> > On Thu, Sep 22, 2016 at 7:23 AM, Jens Axboe <axboe@xxxxxx> wrote:
> > >
> > >
> > > On 09/16/2016 12:16 PM, Andy Lutomirski wrote:
> > > >
> > > >
> > > > Hi all-
> > > >
> > > > Here's v4 of the APST patch set.ÂÂThe biggest bikesheddable
> > > > thing (I
> > > > think) is the scaling factor.ÂÂI currently have it hardcoded so
> > > > that
> > > > we wait 50x the total latency before entering a power saving
> > > > state.
> > > > On my Samsung 950, this means we enter state 3 (70mW, 0.5ms
> > > > entry
> > > > latency, 5ms exit latency) after 275ms and state 4 (5mW, 2ms
> > > > entry
> > > > latency, 22ms exit latency) after 1200ms.ÂÂI have the default
> > > > max
> > > > latency set to 25ms.
> > > >
> > > > FWIW, in practice, the latency this introduces seems to be well
> > > > under 22ms, but my benchmark is a bit silly and I might have
> > > > measured it wrong.ÂÂI certainly haven't observed a slowdown
> > > > just
> > > > using my laptop.
> > > >
> > > > This time around, I changed the names of parameters after Jay
> > > > Frayensee got confused by the first try.ÂÂNow they are:
> > > >
> > > > Â- ps_max_latency_us in sysfs: actually controls it.
> > > > Â- nvme_core.default_ps_max_latency_us: sets the default.
> > > >
> > > > Yeah, they're mouthfuls, but they should be clearer now.
> > >
> > >
> > > The only thing I don't like about this is the fact that's it's a
> > > driver private thing. Similar to ALPM on SATA, it's yet another
> > > knob that needs to be set. It we put it somewhere generic, then
> > > at least we could potentially use it in a generic fashion.
> >
> > Agreed.ÂÂI'm hoping to hear back from Rafael soon about the
> > dev_pm_qos
> > thing.
> >
> > >
> > >
> > > Additionally, it should not be on by default.
> >
> > I think I disagree with this.ÂÂSince we don't have anything like
> > laptop-mode AFAIK, I think we do want it on by default.ÂÂFor the
> > server workloads that want to consume more idle power for faster
> > response when idle, I think the servers should be willing to make
> > this
> > change, just like they need to disable overly deep C states, etc.
> > (Admittedly, unifying the configuration would be nice.)
>
> I can see two reasons why we don't want it the default:
>
> 1) Changes like this has a tendency to cause issues on various types
> of
> hardware. How many NVMe devices have you tested this on? ALPM on SATA
> had a lot of initial problems, where slowed down some SSDs unberably.

...and some SSDs don't even support this feature yet, so the number of
different NVMe devices available to test initially will most likely be
small (like the Fultondales I have, all I could check is to see if the
code broke anything if the device did not have this power-save
feature).

I agree with Jens, makes a lot of sense to start with this feature
'off'.

To 'advertise' the feature, maybe make the feature a new selection in
Kconfig? ÂExample, initially make it "EXPERIMENTAL", and later when
more devices implement this feature it can be integrated more tightly
into the NVMe solution and default to on.