Re: Fix up power managment in 2.6

From: Benjamin Herrenschmidt
Date: Tue Sep 02 2003 - 05:23:46 EST



> However, I'll let the PPC people justify the real reason for the driver
> model change, since it was /their/ requirement that caused it, and I'm
> not going to fight their battles for them. (although I seem to be doing
> exactly that while wasting my time here.)
> It's about time that the people in the PPC community, who were the main
> guys pushing for the driver model change, spoke up and justified this.

Ok, actually it was me, and it was not about PPC specifics but rather
about having sane semantics that actually mean something ;)

The whole point was to get rid of the old 2 step save_state, then
suspend model which didn't make sense. A saved state is only meaningful
as long as that state doesn't get modified afterward, so saving state
and suspending are an atomic operation.

That also means that I plan in the long run to also kill the PCI
save_state. We didn't do that yet to not harm drivers more than we
already did though ;)

Now, about that late "without interrupts" thing. Well, this is not
entirely my design idea, so I'll let Patrick speak up, as I wrote
previously, I'm not too fond of the "I return -EAGAIN" thing, on
the other hand, I wanted to kill the suspend vs. powerdown
distinction we had for a very simple reason:

Once you have suspended the whole PM tree, you can't expect a random
driver to be able to talk to it's device anymore since the "parent"
have been suspended. For example, a USB or 1394 device cannot be
"powered down" at this stage as the previous "suspend" run will have
stopped all activity on the host controller. Since I wanted the overall
PM semantics to be consistent, I asked Patrick to consider removing that
"power down" step. The fact of powering down the device is part of the
same "atomic" suspend operation, deciding wether to power down or not
the device depends solely on the target state (you typically don't do it
for S4 as you expect a complete machine shutdown afterward).

Now, we still had that need, for a few drivers (and in most cases, this
is really about system devices, so it doesn't fit in the same model
anyway, but still, we have a few broken thingies on PCI too), where a
driver need to be suspended "late" (actually powered down late) when
system interrupts have been disabled because that bit of HW is
terminally broken and will assert it's interrupt line in a non
controlled way.

That's the only reason why the powerdown call to the driver model still
exist and why Patrick kept around that mecanism of returning -EAGAIN.

So such a driver is expected to do the following:

- first suspend will actually suspend the IOs as expected (block queues
for
a block driver, stop net queue for a network driver, etc....) and
basically do everything but the actual power down of the device. Then,
it returns -EAGAIN instead of 0 to request beeing called late for
power down.

- second suspend will do the last step.

(I do agree that having the same callback called twice without an
indication of "when" in those 2 steps it is called isn't the best API we
could come up with, but I prefered that to adding a paremeter to
suspend() just for this special case).

This is only to be used on such "broken" devices and only on busses
where you know that your "host" bus is still available after the round
of "suspend" calls (and even PCI may not in some cases), so this is not
something I'd like to see people use too much.

Finally, to reply about UHCI/USB "brokenness", I've been working with
David Brownell and other USB folks lately to get that working properly,
it was not just a matter of adapting to the new callbacks, but also to
do the right thing on the host controller, hopefully, David now has some
patches that will be merged soon that implement proper PM on USB as well
(at least for the host side, I suppose a bunch of device drivers will
have to be fixed, with the new stuff, they'll just get -ESHUTDOWN when
trying to submit URBs after the host is suspended).

Pavel, please keep in mind that proper PM is a difficult task, what we
had before was full of holes, we spent a significant amount of time over
the past year debating the right way to do all of this and Patrick spent
even more time actually implementing it, so rather than just crying
loud, I'd epxect you rather help us fixing what still need to be.

Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/